Restricted Boltzmann Machines

Theory of RBMs and Applications

Author

Jessica Wells and Jason Gerstenberger (Advisor: Dr. Cohen)

Published

March 16, 2025

Introduction

Background

Restricted Boltzmann Machines (RBM) are a type of neural network that has been around since the 1980s. As a reminder to the reader, machine learning is generally divided into 3 categories: supervised learning (examples: classification tasks, regression), unsupervised learning (examples: clustering, dimensionality reduction, generative modeling), and reinforcement learning (examples: gaming/robotics). RBMs are primarily used for unsupervised learning tasks like dimensionality reduction and feature extraction, which help prepare datasets for machine learning models that may later be trained using supervised learning. They also have other applications which will be discussed further later.

Like Hopfield networks, Boltzmann machines are undirected graphical models, but they are different in that they are stochastic and can have hidden units. Both models are energy-based, meaning they learn by minimizing an energy function for each model (Smolensky et al. 1986). Boltzmann machines use a sigmoid activation function, which allows for the model to be probabilistic.

In the “Restricted” Boltzmann Machine, there are no interactions between neurons in the visible layer or between neurons in the hidden layer, creating a bipartite graph of neurons. Below is a diagram taken from Goodfellow, et al. (Goodfellow, Bengio, and Courville 2016) (p. 577) for visualization of the connections.

Code
reticulate::py_config()
python:         /Users/jessicawells/.virtualenvs/r-reticulate/bin/python
libpython:      /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/config-3.12-darwin/libpython3.12.dylib
pythonhome:     /Users/jessicawells/.virtualenvs/r-reticulate:/Users/jessicawells/.virtualenvs/r-reticulate
version:        3.12.6 (v3.12.6:a4a2d2b0d85, Sep  6 2024, 16:08:03) [Clang 13.0.0 (clang-1300.0.29.30)]
numpy:          /Users/jessicawells/.virtualenvs/r-reticulate/lib/python3.12/site-packages/numpy
numpy_version:  1.26.1


Goodfellow, et al. discuss the expense in drawing samples for most undirected graphical models; however, the RBM allows for block Gibbs sampling (p. 578) where the network alternates between sampling all hidden units simultaneously (etc. for visible). Derivatives are also simplified by the fact that the energy function of the RBM is a linear function of it’s parameters, which will be seen further in Methods.

RBMs are trained using a process called Contrastive Divergence (CD) (G. E. Hinton 2002) where the weights are updated to minimize the difference between samples from the data and samples from the model. Learning rate, batch size, and number of hidden units are all hyperparameters that can affect the ability of the training to converge successfully and learn the underlying structure of the data.

Applications

RBMs are probably best known for their success in collaborative filtering. The RBM model was used in the Netflix Prize competition to predict user ratings for movies, with the result that it outperformed the Singular Value Decomposition (SVD) method that was state-of-the-art at the time (Salakhutdinov, Mnih, and Hinton 2007). They have also been trained to recognize handwritten digits, such as the MNIST dataset (G. E. Hinton 2002).

RBMs have been successfully used to distinguish normal and anomalous network traffic. Their potential use in improving network security for companies in the future is promising. There is slow progress in network anomaly detection due to the difficulty of obtaining datasets for training and testing networks. Clients are often reluctant to divulge information that could potentially harm their networks. In a real-life dataset where one host had normal traffic and one was infected by a bot, discriminative RBM (DRBM) was able to successfully distinguish the normal from anomalous traffic. DRBM doesn’t rely on knowing the data distribution ahead of time, which is useful, except that it also causes the DRBM to overfit. As a result, when trying to use the same trained RBM on the KDD ’99 training dataset performance declined. (Fiore et al. 2013)

RBMs can provide greatly improved classification of brain disorders in MRI images. Generative Adversarial Networks (GANs) use two neural networks: a generator which generates fake data, and a discriminator which tries to distinguish between real and fake data. Loss from the discriminator is backpropagated through the generator so that both part are trained simultaneously. The RBM-GAN uses RBM features from real MRI images as inputs to the generator. Features from the discriminator are then used as inputs to a classifier. (Aslan, Dogan, and Koca 2023)

The many-body quantum wavefunction, which describes the quantum state of a system of particles is difficult to compute with classical computers. RBMs have been used to approximate it using variational Monte Carlo methods. (Melko et al. 2019)

RBMs are notoriously slow to train. The process of computing the activation probability requires the calculation of vector dot products. Lean Constrastive Divergence (LCD) is a method which adds two techniques to speed up the process of training RBMs. The first is bounds-based filtering where upper and lower bounds of the probability select only a range of dot products to perform. Second, the delta product involves only recalculating the changed portions of the vector dot product. (Ning, Pittman, and Shen 2018)

Methods

Below is the energy function of the RBM.

\[ E(v,h) = - \sum_{i} a_i v_i - \sum_{j} b_j h_j - \sum_{i} \sum_{j} v_i w_{i,j} h_j \tag{1}\] where vi and hj represent visible and hidden units; ai and bj are the bias terms of the visible and hidden units; and each w{i,j} (weight) element represents the interaction between the visible and hidden units. (Fischer and Igel 2012)

It is well known neural networks are prone to overfitting and often techniques such as early stopping are employed to prevent it. Some methods to prevent overfitting in RBMs are weight decay (L2 regularization), dropout, dropconnect, and weight uncertainty (Zhang et al. 2018). Dropout is a fairly well known concept in deep learning. For example, a dropout value of 0.3 added to a layer means 30% of neurons are dropped during training. This prevents the network from learning certain features too well. L2 regularization is also a commonly employed technique in deep learning. It assigns a penalty to large weights to allow for more generalization. Dropconnect is a method where a subset of weights within the network are randomly set to zero during training. Weight uncertainty is where each weight in the network has it’s own probability distribution vice a fixed value. This addition allows the network to learn more useful features.

If the learning rate is too high, training of the model may not converge. If it is too low, training may take a long time. To fully maximize the training of the model it is helpful to reduce the learning rate over time. This is known as learning rate decay. (G. Hinton 2010)

Logistic Regression

One technique we explore is standardizing Fashion MNIST features/pixels, then training a RBM (unsupervised learning) to extract hidden features from the visible layer and then feed these features into the Logistic Regression Model (vice feeding the raw pixels). The hidden features from the RBM are standardized again before being used as input features for the logistic regression classifier. Then we use the trained logistic regression model to predict labels for test data, evaluating how well the RBM-derived features perform in a supervised classification task. It is helpful to remind the reader about the methodology behind Logistic Regression.

\[ P(Y = k | X) = \frac{e^{\beta_{0k} + \beta_k^T X}}{\sum_{l=1}^{K} e^{\beta_{0l} + \beta_l^T X}} \tag{2}\]

Mathematically, the concept behind binary logistic regression is the logit (the natural logarithm of an odds ratio)(Peng, Lee, and Ingersoll 2002). However, since we have 10 labels, our classification task falls into “Multinomial Logistic Regression.”

Below is our Process for creating the RBM:

Step 1: We first initialize the RBM with random weights and biases and set visible units to 784 and hidden units to 256. We also set the number of contrastive divergence steps (k) to 1.
Step 2: Sample hidden units from visible. The math behind computing the hidden unit activations from the given input can be seen in Equation 3 (Fischer and Igel 2012) where the probability is used to sample from the Bernoulli distribution.
\[ p(H_i = 1 | \mathbf{v}) = \sigma \left( \sum_{j=1}^{m} w_{ij} v_j + c_i \right) \tag{3}\] Step 3: Sample visible units from hidden. The math behind computing visible unit activations from the hidden layer can be seen in Equation 4 (Fischer and Igel 2012) Visible states are sampled using the Bernoulli distribution. This way we can see how well the RBM learned from the inputs.
\[ p(V_j = 1 | \mathbf{h}) = \sigma \left( \sum_{i=1}^{n} w_{ij} h_i + b_j \right) \tag{4}\]

Step 4: K=1 steps of Contrastive Divergence (Feed Forward, Feed Backward) which executes steps 2 and 3. Contrastive Divergence updates the RBM’s weights by minimizing the difference between the original input and the reconstructed input created by the RBM.
Step 5: Free energy is computed. The free energy F is given by the logarithm of the partition function Z (Oh, Baggag, and Nha 2020) where the partition function is
\[ Z(\theta) \equiv \sum_{v,h} e^{-E(v,h; \theta)} \tag{5}\] and the free energy function is
\[ F(\theta) = -\ln Z(\theta) \tag{6}\] where lower free energy means the RBM learned the visible state well.

Step 6: Train the RBM. Model weights updated via gradient descent.
Step 7: Feature extraction for classification with LR. The hidden layer activations of the RBM are used as features for LR.

Hyperparameter Tuning

We use the Tree-structured Parzen Estimator algorithm from Optuna (Akiba et al. 2019) to tune the hyperparameters of the RBM and the classifier models, and we use MLFlow (Zaharia et al. 2018) to record and visualize the results of the hyperparameter tuning process. The hyperparameters we tune include the learning rate, batch size, number of hidden units, and number of epochs.

Analysis and Results

Data Exploration and Visualization

We use the Fashion MNIST dataset from Zalando Research (Xiao, Rasul, and Vollgraf 2017). The set includes 70,000 grayscale images of clothing items, 60,000 for training and 10,000 for testing. Each image is 28x28 pixels (784 pixels total). Each pixel has a value associated with it ranging from 0 (white) to 255 (very dark) – whole numbers only. There are 785 columns in total as one column is dedicated to the label.


There are 10 labels in total:

0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot

Below we load the dataset.

Code
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
import torch
import torchvision.datasets
import torchvision.models
import torchvision.transforms as transforms
import matplotlib.pyplot as plt



train_data = torchvision.datasets.FashionMNIST(
    root="./data", 
    train=True, 
    download=True, 
    transform=transforms.ToTensor()  # Converts to tensor but does NOT normalize
)

test_data = torchvision.datasets.FashionMNIST(
    root="./data", 
    train=False, 
    download=True, 
    transform=transforms.ToTensor()  
)

Get the seventh image to show a sample

Code
# Extract the first image (or choose any index)
image_tensor, label = train_data[6]  # shape: [1, 28, 28]

# Convert tensor to NumPy array
image_array = image_tensor.numpy().squeeze()  

# Plot the image
plt.figure(figsize=(5,5))
plt.imshow(image_array, cmap="gray")
plt.title(f"FashionMNIST Image (Label: {label})")
plt.axis("off")  # Hide axes
(-0.5, 27.5, 27.5, -0.5)
Code
plt.show()

Code
train_images = train_data.data.numpy()  # Raw pixel values (0-255)
train_labels = train_data.targets.numpy()
X = train_images.reshape(-1, 784)  # Flatten 28x28 images into 1D (60000, 784)
Code
#print(train_images[:5])
flattened = train_images[:5].reshape(5, -1) 

# Create a DataFrame
df_flat = pd.DataFrame(flattened)
print(df_flat.head())
   0    1    2    3    4    5    6    ...  777  778  779  780  781  782  783
0    0    0    0    0    0    0    0  ...    0    0    0    0    0    0    0
1    0    0    0    0    0    1    0  ...   76    0    0    0    0    0    0
2    0    0    0    0    0    0    0  ...    0    0    0    0    0    0    0
3    0    0    0    0    0    0    0  ...    0    0    0    0    0    0    0
4    0    0    0    0    0    0    0  ...    0    0    0    0    0    0    0

[5 rows x 784 columns]
Code
#train_df.info() #datatypes are integers

There are no missing values in the data.

Code
print(np.isnan(train_images).any()) 
False

There appears to be no class imbalance

Code
unique_labels, counts = np.unique(train_labels, return_counts=True)

# Print the counts sorted by label
for label, count in zip(unique_labels, counts):
    print(f"Label {label}: {count}")
Label 0: 6000
Label 1: 6000
Label 2: 6000
Label 3: 6000
Label 4: 6000
Label 5: 6000
Label 6: 6000
Label 7: 6000
Label 8: 6000
Label 9: 6000
Code
print(f"X shape: {X.shape}")
X shape: (60000, 784)

t-SNE Visualization

Code
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

# Run t-SNE to reduce dimensionality
#embeddings = TSNE(n_jobs=2).fit_transform(X)

tsne = TSNE(n_jobs=-1, random_state=42)  # Use -1 to use all available cores
embeddings = tsne.fit_transform(X) #use scikitlearn instead


# Create scatter plot
figure = plt.figure(figsize=(15,7))
plt.scatter(embeddings[:, 0], embeddings[:, 1], c=train_labels,
            cmap=plt.cm.get_cmap("jet", 10), marker='.')
plt.colorbar(ticks=range(10))
<matplotlib.colorbar.Colorbar object at 0x3194f5460>
Code
plt.clim(-0.5, 9.5)
plt.title("t-SNE Visualization of Fashion MNIST")
plt.show()

Modeling and Results

Our Models
1. Logistic Regression on Fashion MNIST Data
2. Feed Forward Network on Fashion MNIST Data
3. Convolutional Neural Network on Fashion MNIST Data
4. Logistic Regression on RBM Hidden Features (of Fashion MNIST Data)
5. Feed Forward Network on RBM Hidden Features (of Fashion MNIST Data)

Note: Outputs (50 trials) and Code can be collapsed by the reader

Click to Show Code and Output

Import Libraries and Re-load data for first 3 models

Code
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import datasets, transforms
import numpy as np
import mlflow
import optuna
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from torch.utils.data import DataLoader

# Set device
device = torch.device("mps")

# Load Fashion-MNIST dataset again for the first 3 models
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.FashionMNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.FashionMNIST(root='./data', train=False, transform=transform, download=True)
Model 1: Logistic Regression on Fashion MNIST Data
Click to Show Code and Output
Code
CLASSIFIER = "LogisticRegression"  # Change for FNN, LogisticRegression, or CNN

# Set MLflow experiment name
if CLASSIFIER == "LogisticRegression":
    experiment = mlflow.set_experiment("pytorch-fmnist-lr-noRBM")
elif CLASSIFIER == "FNN":
    experiment = mlflow.set_experiment("pytorch-fmnist-fnn-noRBM")
elif CLASSIFIER == "CNN":
    experiment = mlflow.set_experiment("pytorch-fmnist-cnn-noRBM")

# Define CNN model
class FashionCNN(nn.Module):
    def __init__(self, filters1, filters2, kernel1, kernel2):
        super(FashionCNN, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=filters1, kernel_size=kernel1, padding=1),
            nn.BatchNorm2d(filters1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=filters1, out_channels=filters2, kernel_size=kernel2),
            nn.BatchNorm2d(filters2),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.fc1 = None
        self.drop = nn.Dropout2d(0.25)
        self.fc2 = nn.Linear(in_features=600, out_features=120)
        self.fc3 = nn.Linear(in_features=120, out_features=10)
        

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.view(out.size(0), -1)
        if self.fc1 is None:
            self.fc1 = nn.Linear(out.shape[1], 600).to(x.device)
        out = self.fc1(out)
        out = self.drop(out)
        out = self.fc2(out)
        out = self.fc3(out)
        return out

        # Dynamically calculate flattened size
        out = out.view(out.size(0), -1)  # Flatten
        if self.fc1 is None:
            self.fc1 = nn.Linear(out.shape[1], 600).to(x.device)  # ✅ Update FC layer dynamically

        out = self.fc1(out)
        out = self.drop(out)
        out = self.fc2(out)
        out = self.fc3(out)
        return out




# Define Optuna objective function
def objective(trial):
    batch_size = trial.suggest_int("batch_size", 64, 256, step=32)
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

    mlflow.start_run(experiment_id=experiment.experiment_id)
    num_classifier_epochs = trial.suggest_int("num_classifier_epochs", 5, 5) 
    mlflow.log_param("num_classifier_epochs", num_classifier_epochs)

    if CLASSIFIER == "FNN":
        hidden_size = trial.suggest_int("fnn_hidden", 192, 384)
        learning_rate = trial.suggest_float("learning_rate", 0.0001, 0.0025)

        mlflow.log_param("classifier", "FNN")
        mlflow.log_param("fnn_hidden", hidden_size)
        mlflow.log_param("learning_rate", learning_rate)

        model = nn.Sequential(
            nn.Linear(784, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, 10)
        ).to(device)

        optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    elif CLASSIFIER == "CNN":
        filters1 = trial.suggest_int("filters1", 16, 64, step=16)
        filters2 = trial.suggest_int("filters2", 32, 128, step=32)
        kernel1 = trial.suggest_int("kernel1", 3, 5)
        kernel2 = trial.suggest_int("kernel2", 3, 5)
        learning_rate = trial.suggest_float("learning_rate", 0.0001, 0.0025)

        mlflow.log_param("classifier", "CNN")
        mlflow.log_param("filters1", filters1)
        mlflow.log_param("filters2", filters2)
        mlflow.log_param("kernel1", kernel1)
        mlflow.log_param("kernel2", kernel2)
        mlflow.log_param("learning_rate", learning_rate)

        model = FashionCNN(filters1, filters2, kernel1, kernel2).to(device)
        optimizer = optim.Adam(model.parameters(), lr=learning_rate)

      
    elif CLASSIFIER == "LogisticRegression":
        mlflow.log_param("classifier", "LogisticRegression")
    
        # Prepare data for Logistic Regression (Flatten 28x28 images to 784 features)
        train_features = train_dataset.data.view(-1, 784).numpy()
        train_labels = train_dataset.targets.numpy()
        test_features = test_dataset.data.view(-1, 784).numpy()
        test_labels = test_dataset.targets.numpy()
    
        # Normalize the pixel values to [0,1] for better convergence
        train_features = train_features / 255.0
        test_features = test_features / 255.0
    
    
        C = trial.suggest_float("C", 0.01, 10.0, log=True)  
        solver = "saga" 
    
        model = LogisticRegression(C=C, max_iter=num_classifier_epochs, solver=solver)
        model.fit(train_features, train_labels)
    
    
        predictions = model.predict(test_features)
        accuracy = accuracy_score(test_labels, predictions) * 100
        print(f"Logistic Regression Test Accuracy: {accuracy:.2f}%")
    
        mlflow.log_param("C", C)
        mlflow.log_metric("test_accuracy", accuracy)
        mlflow.end_run()
        return accuracy

    # Training Loop for FNN and CNN
    criterion = nn.CrossEntropyLoss()

    model.train()
    for epoch in range(num_classifier_epochs):
        running_loss = 0.0
        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images) if CLASSIFIER == "CNN" else model(images.view(images.size(0), -1))

            optimizer.zero_grad()
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()

        print(f"{CLASSIFIER} Epoch {epoch+1}: loss = {running_loss / len(train_loader):.4f}")

    # Model Evaluation
    model.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images) if CLASSIFIER == "CNN" else model(images.view(images.size(0), -1))
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print(f"Test Accuracy: {accuracy:.2f}%")

    mlflow.log_metric("test_accuracy", accuracy)
    mlflow.end_run()
    return accuracy

if __name__ == "__main__":
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=50)
    print(f"Best Parameters for {CLASSIFIER}:", study.best_params)
    print("Best Accuracy:", study.best_value)
Logistic Regression Test Accuracy: 84.43%
Logistic Regression Test Accuracy: 84.53%
Logistic Regression Test Accuracy: 84.59%
Logistic Regression Test Accuracy: 84.57%
Logistic Regression Test Accuracy: 84.47%
Logistic Regression Test Accuracy: 84.40%
Logistic Regression Test Accuracy: 84.41%
Logistic Regression Test Accuracy: 84.45%
Logistic Regression Test Accuracy: 84.22%
Logistic Regression Test Accuracy: 84.56%
Logistic Regression Test Accuracy: 84.60%
Logistic Regression Test Accuracy: 84.47%
Logistic Regression Test Accuracy: 84.61%
Logistic Regression Test Accuracy: 84.52%
Logistic Regression Test Accuracy: 84.53%
Logistic Regression Test Accuracy: 84.40%
Logistic Regression Test Accuracy: 84.49%
Logistic Regression Test Accuracy: 84.61%
Logistic Regression Test Accuracy: 84.60%
Logistic Regression Test Accuracy: 84.08%
Logistic Regression Test Accuracy: 84.48%
Logistic Regression Test Accuracy: 84.52%
Logistic Regression Test Accuracy: 84.52%
Logistic Regression Test Accuracy: 84.55%
Logistic Regression Test Accuracy: 84.55%
Logistic Regression Test Accuracy: 84.47%
Logistic Regression Test Accuracy: 84.54%
Logistic Regression Test Accuracy: 84.44%
Logistic Regression Test Accuracy: 84.48%
Logistic Regression Test Accuracy: 84.44%
Logistic Regression Test Accuracy: 84.48%
Logistic Regression Test Accuracy: 84.41%
Logistic Regression Test Accuracy: 84.34%
Logistic Regression Test Accuracy: 84.31%
Logistic Regression Test Accuracy: 84.50%
Logistic Regression Test Accuracy: 84.42%
Logistic Regression Test Accuracy: 84.54%
Logistic Regression Test Accuracy: 84.47%
Logistic Regression Test Accuracy: 84.55%
Logistic Regression Test Accuracy: 84.56%
Logistic Regression Test Accuracy: 84.48%
Logistic Regression Test Accuracy: 84.44%
Logistic Regression Test Accuracy: 84.53%
Logistic Regression Test Accuracy: 84.52%
Logistic Regression Test Accuracy: 84.40%
Logistic Regression Test Accuracy: 84.38%
Logistic Regression Test Accuracy: 84.52%
Logistic Regression Test Accuracy: 84.20%
Logistic Regression Test Accuracy: 84.35%
Logistic Regression Test Accuracy: 84.46%
Best Parameters for LogisticRegression: {'batch_size': 160, 'num_classifier_epochs': 5, 'C': 1.4998430383020946}
Best Accuracy: 84.61

[I 2025-03-16 08:39:39,596] A new study created in memory with name: no-name-bb3c57e3-f240-43e9-a8ab-0572b5db4374
[I 2025-03-16 08:39:46,150] Trial 0 finished with value: 84.43 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'C': 0.04765182474111022}. Best is trial 0 with value: 84.43.
[I 2025-03-16 08:39:52,696] Trial 1 finished with value: 84.53 and parameters: {'batch_size': 256, 'num_classifier_epochs': 5, 'C': 0.2753718641137896}. Best is trial 1 with value: 84.53.
[I 2025-03-16 08:39:59,205] Trial 2 finished with value: 84.59 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'C': 1.8083159254200056}. Best is trial 2 with value: 84.59.
[I 2025-03-16 08:40:05,592] Trial 3 finished with value: 84.57000000000001 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'C': 4.710011652216292}. Best is trial 2 with value: 84.59.
[I 2025-03-16 08:40:12,118] Trial 4 finished with value: 84.47 and parameters: {'batch_size': 256, 'num_classifier_epochs': 5, 'C': 5.583587292121003}. Best is trial 2 with value: 84.59.
[I 2025-03-16 08:40:18,625] Trial 5 finished with value: 84.39999999999999 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'C': 4.890152279858652}. Best is trial 2 with value: 84.59.
[I 2025-03-16 08:40:25,175] Trial 6 finished with value: 84.41 and parameters: {'batch_size': 256, 'num_classifier_epochs': 5, 'C': 9.822276056603316}. Best is trial 2 with value: 84.59.
[I 2025-03-16 08:40:31,703] Trial 7 finished with value: 84.45 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'C': 0.18295120844241494}. Best is trial 2 with value: 84.59.
[I 2025-03-16 08:40:38,128] Trial 8 finished with value: 84.22 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'C': 0.025016108854941516}. Best is trial 2 with value: 84.59.
[I 2025-03-16 08:40:44,659] Trial 9 finished with value: 84.56 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'C': 1.1694235829302306}. Best is trial 2 with value: 84.59.
[I 2025-03-16 08:40:51,141] Trial 10 finished with value: 84.6 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'C': 1.2174256148716063}. Best is trial 10 with value: 84.6.
[I 2025-03-16 08:40:57,732] Trial 11 finished with value: 84.47 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'C': 1.087349109853976}. Best is trial 10 with value: 84.6.
[I 2025-03-16 08:41:04,172] Trial 12 finished with value: 84.61 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'C': 1.4998430383020946}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:41:10,617] Trial 13 finished with value: 84.52 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'C': 0.5108950886326699}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:41:17,169] Trial 14 finished with value: 84.53 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'C': 0.09970545178055577}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:41:23,670] Trial 15 finished with value: 84.39999999999999 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'C': 0.6213123456196047}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:41:30,202] Trial 16 finished with value: 84.49 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'C': 2.4800690358057356}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:41:36,712] Trial 17 finished with value: 84.61 and parameters: {'batch_size': 224, 'num_classifier_epochs': 5, 'C': 0.560640235791551}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:41:43,221] Trial 18 finished with value: 84.6 and parameters: {'batch_size': 224, 'num_classifier_epochs': 5, 'C': 0.10977379785781691}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:41:49,689] Trial 19 finished with value: 84.08 and parameters: {'batch_size': 224, 'num_classifier_epochs': 5, 'C': 0.013966123800389607}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:41:56,148] Trial 20 finished with value: 84.48 and parameters: {'batch_size': 224, 'num_classifier_epochs': 5, 'C': 0.5122876407781881}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:42:02,628] Trial 21 finished with value: 84.52 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'C': 1.0112944487848032}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:42:09,182] Trial 22 finished with value: 84.52 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'C': 2.1136150748435836}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:42:15,676] Trial 23 finished with value: 84.55 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'C': 0.22079590811232758}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:42:22,198] Trial 24 finished with value: 84.55 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'C': 0.5563522278921428}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:42:28,750] Trial 25 finished with value: 84.47 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'C': 2.5744232169704735}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:42:35,362] Trial 26 finished with value: 84.54 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'C': 0.9174365864378011}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:42:41,961] Trial 27 finished with value: 84.44 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'C': 0.3830673282743313}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:42:48,497] Trial 28 finished with value: 84.48 and parameters: {'batch_size': 224, 'num_classifier_epochs': 5, 'C': 1.7177009839578117}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:42:55,059] Trial 29 finished with value: 84.44 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'C': 0.12907352305666378}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:43:01,607] Trial 30 finished with value: 84.48 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'C': 2.9640119921460175}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:43:08,183] Trial 31 finished with value: 84.41 and parameters: {'batch_size': 224, 'num_classifier_epochs': 5, 'C': 0.0641281057971579}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:43:14,746] Trial 32 finished with value: 84.34 and parameters: {'batch_size': 224, 'num_classifier_epochs': 5, 'C': 0.04324796057119752}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:43:21,272] Trial 33 finished with value: 84.31 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'C': 0.3183023804106568}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:43:27,839] Trial 34 finished with value: 84.5 and parameters: {'batch_size': 256, 'num_classifier_epochs': 5, 'C': 0.16451708205796792}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:43:34,396] Trial 35 finished with value: 84.42 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'C': 0.08026944216883272}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:43:40,957] Trial 36 finished with value: 84.54 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'C': 1.3876421299602493}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:43:47,545] Trial 37 finished with value: 84.47 and parameters: {'batch_size': 256, 'num_classifier_epochs': 5, 'C': 3.6065644969112416}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:43:54,131] Trial 38 finished with value: 84.55 and parameters: {'batch_size': 224, 'num_classifier_epochs': 5, 'C': 0.7280456951817234}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:44:00,694] Trial 39 finished with value: 84.56 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'C': 7.661483462137936}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:44:07,199] Trial 40 finished with value: 84.48 and parameters: {'batch_size': 256, 'num_classifier_epochs': 5, 'C': 0.04523141073645949}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:44:13,788] Trial 41 finished with value: 84.44 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'C': 1.6448879798303748}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:44:20,348] Trial 42 finished with value: 84.53 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'C': 3.968887306371254}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:44:26,843] Trial 43 finished with value: 84.52 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'C': 0.38606504636718136}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:44:33,357] Trial 44 finished with value: 84.39999999999999 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'C': 0.8585461667494334}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:44:39,845] Trial 45 finished with value: 84.38 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'C': 0.25322869430069556}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:44:46,326] Trial 46 finished with value: 84.52 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'C': 1.4607922085737501}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:44:52,857] Trial 47 finished with value: 84.2 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'C': 0.02870744582505788}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:44:59,359] Trial 48 finished with value: 84.35000000000001 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'C': 5.7226145747756165}. Best is trial 12 with value: 84.61.
[I 2025-03-16 08:45:05,890] Trial 49 finished with value: 84.46000000000001 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'C': 1.9863447487398138}. Best is trial 12 with value: 84.61.

Test Accuracy of Logistic Regression by C (inverse regularization strength)

Model 2: Feed Forward Network on Fashion MNIST Data
Click to Show Code and Output
Code
CLASSIFIER = "FNN"  # Change for FNN, LogisticRegression, or CNN

# Set MLflow experiment name
if CLASSIFIER == "LogisticRegression":
    experiment = mlflow.set_experiment("pytorch-fmnist-lr-noRBM")
elif CLASSIFIER == "FNN":
    experiment = mlflow.set_experiment("pytorch-fmnist-fnn-noRBM")
elif CLASSIFIER == "CNN":
    experiment = mlflow.set_experiment("pytorch-fmnist-cnn-noRBM")

# Define CNN model
class FashionCNN(nn.Module):
    def __init__(self, filters1, filters2, kernel1, kernel2):
        super(FashionCNN, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=filters1, kernel_size=kernel1, padding=1),
            nn.BatchNorm2d(filters1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=filters1, out_channels=filters2, kernel_size=kernel2),
            nn.BatchNorm2d(filters2),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.fc1 = None
        self.drop = nn.Dropout2d(0.25)
        self.fc2 = nn.Linear(in_features=600, out_features=120)
        self.fc3 = nn.Linear(in_features=120, out_features=10)
        

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.view(out.size(0), -1)
        if self.fc1 is None:
            self.fc1 = nn.Linear(out.shape[1], 600).to(x.device)
        out = self.fc1(out)
        out = self.drop(out)
        out = self.fc2(out)
        out = self.fc3(out)
        return out

        # Dynamically calculate flattened size
        out = out.view(out.size(0), -1)  # Flatten
        if self.fc1 is None:
            self.fc1 = nn.Linear(out.shape[1], 600).to(x.device)  # ✅ Update FC layer dynamically

        out = self.fc1(out)
        out = self.drop(out)
        out = self.fc2(out)
        out = self.fc3(out)
        return out




# Define Optuna objective function
def objective(trial):
    batch_size = trial.suggest_int("batch_size", 64, 256, step=32)
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

    mlflow.start_run(experiment_id=experiment.experiment_id)
    num_classifier_epochs = trial.suggest_int("num_classifier_epochs", 5, 5) 
    mlflow.log_param("num_classifier_epochs", num_classifier_epochs)

    if CLASSIFIER == "FNN":
        hidden_size = trial.suggest_int("fnn_hidden", 192, 384)
        learning_rate = trial.suggest_float("learning_rate", 0.0001, 0.0025)

        mlflow.log_param("classifier", "FNN")
        mlflow.log_param("fnn_hidden", hidden_size)
        mlflow.log_param("learning_rate", learning_rate)

        model = nn.Sequential(
            nn.Linear(784, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, 10)
        ).to(device)

        optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    elif CLASSIFIER == "CNN":
        filters1 = trial.suggest_int("filters1", 16, 64, step=16)
        filters2 = trial.suggest_int("filters2", 32, 128, step=32)
        kernel1 = trial.suggest_int("kernel1", 3, 5)
        kernel2 = trial.suggest_int("kernel2", 3, 5)
        learning_rate = trial.suggest_float("learning_rate", 0.0001, 0.0025)

        mlflow.log_param("classifier", "CNN")
        mlflow.log_param("filters1", filters1)
        mlflow.log_param("filters2", filters2)
        mlflow.log_param("kernel1", kernel1)
        mlflow.log_param("kernel2", kernel2)
        mlflow.log_param("learning_rate", learning_rate)

        model = FashionCNN(filters1, filters2, kernel1, kernel2).to(device)
        optimizer = optim.Adam(model.parameters(), lr=learning_rate)

      
    elif CLASSIFIER == "LogisticRegression":
        mlflow.log_param("classifier", "LogisticRegression")
    
        # Prepare data for Logistic Regression (Flatten 28x28 images to 784 features)
        train_features = train_dataset.data.view(-1, 784).numpy()
        train_labels = train_dataset.targets.numpy()
        test_features = test_dataset.data.view(-1, 784).numpy()
        test_labels = test_dataset.targets.numpy()
    
        # Normalize the pixel values to [0,1] for better convergence
        train_features = train_features / 255.0
        test_features = test_features / 255.0
    
    
        C = trial.suggest_float("C", 0.01, 10.0, log=True)  
        solver = "saga" 
    
        model = LogisticRegression(C=C, max_iter=num_classifier_epochs, solver=solver)
        model.fit(train_features, train_labels)
    
    
        predictions = model.predict(test_features)
        accuracy = accuracy_score(test_labels, predictions) * 100
        print(f"Logistic Regression Test Accuracy: {accuracy:.2f}%")
    
        mlflow.log_param("C", C)
        mlflow.log_metric("test_accuracy", accuracy)
        mlflow.end_run()
        return accuracy

    # Training Loop for FNN and CNN
    criterion = nn.CrossEntropyLoss()

    model.train()
    for epoch in range(num_classifier_epochs):
        running_loss = 0.0
        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images) if CLASSIFIER == "CNN" else model(images.view(images.size(0), -1))

            optimizer.zero_grad()
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()

        print(f"{CLASSIFIER} Epoch {epoch+1}: loss = {running_loss / len(train_loader):.4f}")

    # Model Evaluation
    model.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images) if CLASSIFIER == "CNN" else model(images.view(images.size(0), -1))
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print(f"Test Accuracy: {accuracy:.2f}%")

    mlflow.log_metric("test_accuracy", accuracy)
    mlflow.end_run()
    return accuracy

if __name__ == "__main__":
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=50)
    print(f"Best Parameters for {CLASSIFIER}:", study.best_params)
    print("Best Accuracy:", study.best_value)
FNN Epoch 1: loss = 0.5489
FNN Epoch 2: loss = 0.3896
FNN Epoch 3: loss = 0.3497
FNN Epoch 4: loss = 0.3196
FNN Epoch 5: loss = 0.3028
Test Accuracy: 86.20%
FNN Epoch 1: loss = 0.5322
FNN Epoch 2: loss = 0.3777
FNN Epoch 3: loss = 0.3350
FNN Epoch 4: loss = 0.3118
FNN Epoch 5: loss = 0.2931
Test Accuracy: 87.25%
FNN Epoch 1: loss = 0.5939
FNN Epoch 2: loss = 0.4197
FNN Epoch 3: loss = 0.3757
FNN Epoch 4: loss = 0.3507
FNN Epoch 5: loss = 0.3282
Test Accuracy: 87.01%
FNN Epoch 1: loss = 0.5227
FNN Epoch 2: loss = 0.3844
FNN Epoch 3: loss = 0.3390
FNN Epoch 4: loss = 0.3120
FNN Epoch 5: loss = 0.2963
Test Accuracy: 87.37%
FNN Epoch 1: loss = 0.8703
FNN Epoch 2: loss = 0.5190
FNN Epoch 3: loss = 0.4605
FNN Epoch 4: loss = 0.4281
FNN Epoch 5: loss = 0.4065
Test Accuracy: 84.96%
FNN Epoch 1: loss = 0.5516
FNN Epoch 2: loss = 0.3883
FNN Epoch 3: loss = 0.3440
FNN Epoch 4: loss = 0.3138
FNN Epoch 5: loss = 0.2981
Test Accuracy: 87.30%
FNN Epoch 1: loss = 0.7628
FNN Epoch 2: loss = 0.4851
FNN Epoch 3: loss = 0.4372
FNN Epoch 4: loss = 0.4109
FNN Epoch 5: loss = 0.3907
Test Accuracy: 85.42%
FNN Epoch 1: loss = 0.4984
FNN Epoch 2: loss = 0.3658
FNN Epoch 3: loss = 0.3304
FNN Epoch 4: loss = 0.3075
FNN Epoch 5: loss = 0.2924
Test Accuracy: 88.03%
FNN Epoch 1: loss = 0.6337
FNN Epoch 2: loss = 0.4334
FNN Epoch 3: loss = 0.3905
FNN Epoch 4: loss = 0.3701
FNN Epoch 5: loss = 0.3458
Test Accuracy: 86.24%
FNN Epoch 1: loss = 0.5426
FNN Epoch 2: loss = 0.3878
FNN Epoch 3: loss = 0.3469
FNN Epoch 4: loss = 0.3178
FNN Epoch 5: loss = 0.3007
Test Accuracy: 87.25%
FNN Epoch 1: loss = 0.4990
FNN Epoch 2: loss = 0.3707
FNN Epoch 3: loss = 0.3305
FNN Epoch 4: loss = 0.3078
FNN Epoch 5: loss = 0.2909
Test Accuracy: 87.77%
FNN Epoch 1: loss = 0.4989
FNN Epoch 2: loss = 0.3642
FNN Epoch 3: loss = 0.3268
FNN Epoch 4: loss = 0.3056
FNN Epoch 5: loss = 0.2907
Test Accuracy: 87.17%
FNN Epoch 1: loss = 0.5050
FNN Epoch 2: loss = 0.3719
FNN Epoch 3: loss = 0.3334
FNN Epoch 4: loss = 0.3054
FNN Epoch 5: loss = 0.2906
Test Accuracy: 87.62%
FNN Epoch 1: loss = 0.5185
FNN Epoch 2: loss = 0.3757
FNN Epoch 3: loss = 0.3340
FNN Epoch 4: loss = 0.3078
FNN Epoch 5: loss = 0.2928
Test Accuracy: 86.61%
FNN Epoch 1: loss = 0.5272
FNN Epoch 2: loss = 0.3837
FNN Epoch 3: loss = 0.3400
FNN Epoch 4: loss = 0.3157
FNN Epoch 5: loss = 0.2955
Test Accuracy: 86.84%
FNN Epoch 1: loss = 0.4964
FNN Epoch 2: loss = 0.3727
FNN Epoch 3: loss = 0.3345
FNN Epoch 4: loss = 0.3109
FNN Epoch 5: loss = 0.2925
Test Accuracy: 86.43%
FNN Epoch 1: loss = 0.5228
FNN Epoch 2: loss = 0.3807
FNN Epoch 3: loss = 0.3395
FNN Epoch 4: loss = 0.3130
FNN Epoch 5: loss = 0.2922
Test Accuracy: 87.11%
FNN Epoch 1: loss = 0.5627
FNN Epoch 2: loss = 0.4034
FNN Epoch 3: loss = 0.3634
FNN Epoch 4: loss = 0.3332
FNN Epoch 5: loss = 0.3143
Test Accuracy: 87.48%
FNN Epoch 1: loss = 0.5064
FNN Epoch 2: loss = 0.3714
FNN Epoch 3: loss = 0.3348
FNN Epoch 4: loss = 0.3130
FNN Epoch 5: loss = 0.2946
Test Accuracy: 87.42%
FNN Epoch 1: loss = 0.5053
FNN Epoch 2: loss = 0.3703
FNN Epoch 3: loss = 0.3317
FNN Epoch 4: loss = 0.3087
FNN Epoch 5: loss = 0.2900
Test Accuracy: 87.40%
FNN Epoch 1: loss = 0.5644
FNN Epoch 2: loss = 0.3984
FNN Epoch 3: loss = 0.3564
FNN Epoch 4: loss = 0.3273
FNN Epoch 5: loss = 0.3059
Test Accuracy: 86.88%
FNN Epoch 1: loss = 0.5135
FNN Epoch 2: loss = 0.3740
FNN Epoch 3: loss = 0.3330
FNN Epoch 4: loss = 0.3056
FNN Epoch 5: loss = 0.2889
Test Accuracy: 87.40%
FNN Epoch 1: loss = 0.5208
FNN Epoch 2: loss = 0.3820
FNN Epoch 3: loss = 0.3408
FNN Epoch 4: loss = 0.3177
FNN Epoch 5: loss = 0.2981
Test Accuracy: 86.63%
FNN Epoch 1: loss = 0.5069
FNN Epoch 2: loss = 0.3733
FNN Epoch 3: loss = 0.3335
FNN Epoch 4: loss = 0.3101
FNN Epoch 5: loss = 0.2901
Test Accuracy: 87.27%
FNN Epoch 1: loss = 0.5037
FNN Epoch 2: loss = 0.3755
FNN Epoch 3: loss = 0.3340
FNN Epoch 4: loss = 0.3112
FNN Epoch 5: loss = 0.2919
Test Accuracy: 87.89%
FNN Epoch 1: loss = 0.5291
FNN Epoch 2: loss = 0.3797
FNN Epoch 3: loss = 0.3417
FNN Epoch 4: loss = 0.3176
FNN Epoch 5: loss = 0.3009
Test Accuracy: 85.89%
FNN Epoch 1: loss = 0.5183
FNN Epoch 2: loss = 0.3810
FNN Epoch 3: loss = 0.3373
FNN Epoch 4: loss = 0.3113
FNN Epoch 5: loss = 0.2967
Test Accuracy: 87.53%
FNN Epoch 1: loss = 0.4995
FNN Epoch 2: loss = 0.3694
FNN Epoch 3: loss = 0.3316
FNN Epoch 4: loss = 0.3072
FNN Epoch 5: loss = 0.2930
Test Accuracy: 87.63%
FNN Epoch 1: loss = 0.5485
FNN Epoch 2: loss = 0.3953
FNN Epoch 3: loss = 0.3567
FNN Epoch 4: loss = 0.3292
FNN Epoch 5: loss = 0.3092
Test Accuracy: 87.25%
FNN Epoch 1: loss = 0.5146
FNN Epoch 2: loss = 0.3754
FNN Epoch 3: loss = 0.3311
FNN Epoch 4: loss = 0.3104
FNN Epoch 5: loss = 0.2922
Test Accuracy: 87.43%
FNN Epoch 1: loss = 0.5039
FNN Epoch 2: loss = 0.3690
FNN Epoch 3: loss = 0.3292
FNN Epoch 4: loss = 0.3079
FNN Epoch 5: loss = 0.2865
Test Accuracy: 87.52%
FNN Epoch 1: loss = 0.4893
FNN Epoch 2: loss = 0.3679
FNN Epoch 3: loss = 0.3329
FNN Epoch 4: loss = 0.3064
FNN Epoch 5: loss = 0.2911
Test Accuracy: 87.36%
FNN Epoch 1: loss = 0.4936
FNN Epoch 2: loss = 0.3700
FNN Epoch 3: loss = 0.3361
FNN Epoch 4: loss = 0.3090
FNN Epoch 5: loss = 0.2908
Test Accuracy: 87.22%
FNN Epoch 1: loss = 0.5159
FNN Epoch 2: loss = 0.3746
FNN Epoch 3: loss = 0.3351
FNN Epoch 4: loss = 0.3098
FNN Epoch 5: loss = 0.2921
Test Accuracy: 87.18%
FNN Epoch 1: loss = 0.4972
FNN Epoch 2: loss = 0.3664
FNN Epoch 3: loss = 0.3287
FNN Epoch 4: loss = 0.3081
FNN Epoch 5: loss = 0.2877
Test Accuracy: 87.84%
FNN Epoch 1: loss = 0.5157
FNN Epoch 2: loss = 0.3773
FNN Epoch 3: loss = 0.3326
FNN Epoch 4: loss = 0.3086
FNN Epoch 5: loss = 0.2908
Test Accuracy: 87.21%
FNN Epoch 1: loss = 0.5277
FNN Epoch 2: loss = 0.3834
FNN Epoch 3: loss = 0.3430
FNN Epoch 4: loss = 0.3132
FNN Epoch 5: loss = 0.2957
Test Accuracy: 87.19%
FNN Epoch 1: loss = 0.7073
FNN Epoch 2: loss = 0.4611
FNN Epoch 3: loss = 0.4115
FNN Epoch 4: loss = 0.3835
FNN Epoch 5: loss = 0.3616
Test Accuracy: 86.34%
FNN Epoch 1: loss = 0.4896
FNN Epoch 2: loss = 0.3692
FNN Epoch 3: loss = 0.3320
FNN Epoch 4: loss = 0.3090
FNN Epoch 5: loss = 0.2936
Test Accuracy: 87.49%
FNN Epoch 1: loss = 0.5253
FNN Epoch 2: loss = 0.3779
FNN Epoch 3: loss = 0.3406
FNN Epoch 4: loss = 0.3159
FNN Epoch 5: loss = 0.2978
Test Accuracy: 85.65%
FNN Epoch 1: loss = 0.5041
FNN Epoch 2: loss = 0.3684
FNN Epoch 3: loss = 0.3319
FNN Epoch 4: loss = 0.3065
FNN Epoch 5: loss = 0.2884
Test Accuracy: 87.05%
FNN Epoch 1: loss = 0.4978
FNN Epoch 2: loss = 0.3710
FNN Epoch 3: loss = 0.3309
FNN Epoch 4: loss = 0.3084
FNN Epoch 5: loss = 0.2910
Test Accuracy: 88.32%
FNN Epoch 1: loss = 0.4999
FNN Epoch 2: loss = 0.3713
FNN Epoch 3: loss = 0.3326
FNN Epoch 4: loss = 0.3079
FNN Epoch 5: loss = 0.2891
Test Accuracy: 86.70%
FNN Epoch 1: loss = 0.5131
FNN Epoch 2: loss = 0.3783
FNN Epoch 3: loss = 0.3383
FNN Epoch 4: loss = 0.3095
FNN Epoch 5: loss = 0.2942
Test Accuracy: 87.27%
FNN Epoch 1: loss = 0.4936
FNN Epoch 2: loss = 0.3656
FNN Epoch 3: loss = 0.3327
FNN Epoch 4: loss = 0.3085
FNN Epoch 5: loss = 0.2924
Test Accuracy: 87.78%
FNN Epoch 1: loss = 0.5042
FNN Epoch 2: loss = 0.3739
FNN Epoch 3: loss = 0.3350
FNN Epoch 4: loss = 0.3093
FNN Epoch 5: loss = 0.2911
Test Accuracy: 87.66%
FNN Epoch 1: loss = 0.4898
FNN Epoch 2: loss = 0.3672
FNN Epoch 3: loss = 0.3332
FNN Epoch 4: loss = 0.3068
FNN Epoch 5: loss = 0.2935
Test Accuracy: 87.74%
FNN Epoch 1: loss = 0.4954
FNN Epoch 2: loss = 0.3753
FNN Epoch 3: loss = 0.3338
FNN Epoch 4: loss = 0.3126
FNN Epoch 5: loss = 0.2947
Test Accuracy: 87.34%
FNN Epoch 1: loss = 0.5372
FNN Epoch 2: loss = 0.3928
FNN Epoch 3: loss = 0.3471
FNN Epoch 4: loss = 0.3197
FNN Epoch 5: loss = 0.2999
Test Accuracy: 87.53%
FNN Epoch 1: loss = 0.5962
FNN Epoch 2: loss = 0.4197
FNN Epoch 3: loss = 0.3761
FNN Epoch 4: loss = 0.3513
FNN Epoch 5: loss = 0.3302
Test Accuracy: 86.86%
Best Parameters for FNN: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 292, 'learning_rate': 0.0016356836132324745}
Best Accuracy: 88.32

[I 2025-03-16 08:45:06,549] A new study created in memory with name: no-name-50931ee8-76bd-4b92-ac0c-fc43f79ef9b5
[I 2025-03-16 08:45:19,285] Trial 0 finished with value: 86.2 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'fnn_hidden': 249, 'learning_rate': 0.002041334889842791}. Best is trial 0 with value: 86.2.
[I 2025-03-16 08:45:34,035] Trial 1 finished with value: 87.25 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'fnn_hidden': 362, 'learning_rate': 0.0015657691305288111}. Best is trial 1 with value: 87.25.
[I 2025-03-16 08:45:47,584] Trial 2 finished with value: 87.01 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'fnn_hidden': 345, 'learning_rate': 0.0006724048207704316}. Best is trial 1 with value: 87.25.
[I 2025-03-16 08:46:00,212] Trial 3 finished with value: 87.37 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'fnn_hidden': 287, 'learning_rate': 0.002486279310956767}. Best is trial 3 with value: 87.37.
[I 2025-03-16 08:46:11,920] Trial 4 finished with value: 84.96 and parameters: {'batch_size': 256, 'num_classifier_epochs': 5, 'fnn_hidden': 363, 'learning_rate': 0.00021536368651932085}. Best is trial 3 with value: 87.37.
[I 2025-03-16 08:46:23,622] Trial 5 finished with value: 87.3 and parameters: {'batch_size': 256, 'num_classifier_epochs': 5, 'fnn_hidden': 358, 'learning_rate': 0.002221211523059716}. Best is trial 3 with value: 87.37.
[I 2025-03-16 08:46:40,172] Trial 6 finished with value: 85.42 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'fnn_hidden': 199, 'learning_rate': 0.00019794647537673616}. Best is trial 3 with value: 87.37.
[I 2025-03-16 08:47:00,373] Trial 7 finished with value: 88.03 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 274, 'learning_rate': 0.001758302803314886}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:47:13,032] Trial 8 finished with value: 86.24 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'fnn_hidden': 217, 'learning_rate': 0.0007367070230066769}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:47:26,552] Trial 9 finished with value: 87.25 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'fnn_hidden': 197, 'learning_rate': 0.002364596760794612}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:47:46,991] Trial 10 finished with value: 87.77 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 304, 'learning_rate': 0.0015796871230063132}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:48:07,244] Trial 11 finished with value: 87.17 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 313, 'learning_rate': 0.0015278021536822292}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:48:27,407] Trial 12 finished with value: 87.62 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 283, 'learning_rate': 0.0011652982907131503}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:48:44,326] Trial 13 finished with value: 86.61 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'fnn_hidden': 254, 'learning_rate': 0.001756723478328775}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:49:01,298] Trial 14 finished with value: 86.84 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'fnn_hidden': 319, 'learning_rate': 0.0011507600744882305}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:49:21,556] Trial 15 finished with value: 86.43 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 258, 'learning_rate': 0.0018750338833960333}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:49:36,425] Trial 16 finished with value: 87.11 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'fnn_hidden': 318, 'learning_rate': 0.0014568787940341143}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:49:51,231] Trial 17 finished with value: 87.48 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'fnn_hidden': 270, 'learning_rate': 0.0009544677913653444}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:50:12,261] Trial 18 finished with value: 87.42 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 228, 'learning_rate': 0.001782689509938698}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:50:29,200] Trial 19 finished with value: 87.4 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'fnn_hidden': 304, 'learning_rate': 0.001991099390389825}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:50:41,640] Trial 20 finished with value: 86.88 and parameters: {'batch_size': 224, 'num_classifier_epochs': 5, 'fnn_hidden': 383, 'learning_rate': 0.0013742807083120545}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:51:02,953] Trial 21 finished with value: 87.4 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 291, 'learning_rate': 0.0011686121600534317}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:51:23,828] Trial 22 finished with value: 86.63 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 283, 'learning_rate': 0.000945059779116471}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:51:40,696] Trial 23 finished with value: 87.27 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'fnn_hidden': 335, 'learning_rate': 0.0016322584577049674}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:52:01,455] Trial 24 finished with value: 87.89 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 274, 'learning_rate': 0.0012249701135960802}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:52:18,374] Trial 25 finished with value: 85.89 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'fnn_hidden': 242, 'learning_rate': 0.0012886471671216483}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:52:33,143] Trial 26 finished with value: 87.53 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'fnn_hidden': 269, 'learning_rate': 0.002111862225819758}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:52:53,543] Trial 27 finished with value: 87.63 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 302, 'learning_rate': 0.0016772740958357606}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:53:10,700] Trial 28 finished with value: 87.25 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'fnn_hidden': 234, 'learning_rate': 0.0009232121615853049}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:53:25,766] Trial 29 finished with value: 87.43 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'fnn_hidden': 266, 'learning_rate': 0.002070747672134195}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:53:46,245] Trial 30 finished with value: 87.52 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 330, 'learning_rate': 0.001343442479518513}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:54:06,556] Trial 31 finished with value: 87.36 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 300, 'learning_rate': 0.001684415274628418}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:54:26,863] Trial 32 finished with value: 87.22 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 299, 'learning_rate': 0.0019207317118902945}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:54:43,737] Trial 33 finished with value: 87.18 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'fnn_hidden': 308, 'learning_rate': 0.0015793749159381848}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:55:03,917] Trial 34 finished with value: 87.84 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 279, 'learning_rate': 0.0017840485399559939}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:55:20,973] Trial 35 finished with value: 87.21 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'fnn_hidden': 276, 'learning_rate': 0.0014641867638422465}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:55:35,840] Trial 36 finished with value: 87.19 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'fnn_hidden': 245, 'learning_rate': 0.0018518309681294712}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:55:48,158] Trial 37 finished with value: 86.34 and parameters: {'batch_size': 224, 'num_classifier_epochs': 5, 'fnn_hidden': 290, 'learning_rate': 0.00046002985336839725}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:56:09,032] Trial 38 finished with value: 87.49 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 264, 'learning_rate': 0.0022882566440419897}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:56:22,796] Trial 39 finished with value: 85.65 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'fnn_hidden': 277, 'learning_rate': 0.0021699473555820362}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:56:43,320] Trial 40 finished with value: 87.05 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 332, 'learning_rate': 0.0012455527992751194}. Best is trial 7 with value: 88.03.
[I 2025-03-16 08:57:04,108] Trial 41 finished with value: 88.32 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 292, 'learning_rate': 0.0016356836132324745}. Best is trial 41 with value: 88.32.
[I 2025-03-16 08:57:24,151] Trial 42 finished with value: 86.7 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 290, 'learning_rate': 0.001469203055120627}. Best is trial 41 with value: 88.32.
[I 2025-03-16 08:57:41,154] Trial 43 finished with value: 87.27 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'fnn_hidden': 253, 'learning_rate': 0.0017747349435670693}. Best is trial 41 with value: 88.32.
[I 2025-03-16 08:58:01,654] Trial 44 finished with value: 87.78 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 277, 'learning_rate': 0.0019528761161828165}. Best is trial 41 with value: 88.32.
[I 2025-03-16 08:58:18,184] Trial 45 finished with value: 87.66 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'fnn_hidden': 279, 'learning_rate': 0.001996414590988217}. Best is trial 41 with value: 88.32.
[I 2025-03-16 08:58:38,321] Trial 46 finished with value: 87.74 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 263, 'learning_rate': 0.00230582576518422}. Best is trial 41 with value: 88.32.
[I 2025-03-16 08:58:59,152] Trial 47 finished with value: 87.34 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'fnn_hidden': 211, 'learning_rate': 0.0019151235109020274}. Best is trial 41 with value: 88.32.
[I 2025-03-16 08:59:16,090] Trial 48 finished with value: 87.53 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'fnn_hidden': 273, 'learning_rate': 0.0010358063577284046}. Best is trial 41 with value: 88.32.
[I 2025-03-16 08:59:29,735] Trial 49 finished with value: 86.86 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'fnn_hidden': 291, 'learning_rate': 0.0007191926238311791}. Best is trial 41 with value: 88.32.

Test Accuracy by FNN Hidden Units


Model 3: Convolutional Neural Network on Fashion MNIST Data
Base code for CNN structure borrowed from Kaggle

Click to Show Code and Output
Code
CLASSIFIER = "CNN"  # Change for FNN, LogisticRegression, or CNN

# Set MLflow experiment name
if CLASSIFIER == "LogisticRegression":
    experiment = mlflow.set_experiment("pytorch-fmnist-lr-noRBM")
elif CLASSIFIER == "FNN":
    experiment = mlflow.set_experiment("pytorch-fmnist-fnn-noRBM")
elif CLASSIFIER == "CNN":
    experiment = mlflow.set_experiment("pytorch-fmnist-cnn-noRBM")

# Define CNN model
class FashionCNN(nn.Module):
    def __init__(self, filters1, filters2, kernel1, kernel2):
        super(FashionCNN, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=filters1, kernel_size=kernel1, padding=1),
            nn.BatchNorm2d(filters1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=filters1, out_channels=filters2, kernel_size=kernel2),
            nn.BatchNorm2d(filters2),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.fc1 = None
        self.drop = nn.Dropout2d(0.25)
        self.fc2 = nn.Linear(in_features=600, out_features=120)
        self.fc3 = nn.Linear(in_features=120, out_features=10)
        

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.view(out.size(0), -1)
        if self.fc1 is None:
            self.fc1 = nn.Linear(out.shape[1], 600).to(x.device)
        out = self.fc1(out)
        out = self.drop(out)
        out = self.fc2(out)
        out = self.fc3(out)
        return out

        # Dynamically calculate flattened size
        out = out.view(out.size(0), -1)  # Flatten
        if self.fc1 is None:
            self.fc1 = nn.Linear(out.shape[1], 600).to(x.device)  # ✅ Update FC layer dynamically

        out = self.fc1(out)
        out = self.drop(out)
        out = self.fc2(out)
        out = self.fc3(out)
        return out




# Define Optuna objective function
def objective(trial):
    batch_size = trial.suggest_int("batch_size", 64, 256, step=32)
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

    mlflow.start_run(experiment_id=experiment.experiment_id)
    num_classifier_epochs = trial.suggest_int("num_classifier_epochs", 5, 5) 
    mlflow.log_param("num_classifier_epochs", num_classifier_epochs)

    if CLASSIFIER == "FNN":
        hidden_size = trial.suggest_int("fnn_hidden", 192, 384)
        learning_rate = trial.suggest_float("learning_rate", 0.0001, 0.0025)

        mlflow.log_param("classifier", "FNN")
        mlflow.log_param("fnn_hidden", hidden_size)
        mlflow.log_param("learning_rate", learning_rate)

        model = nn.Sequential(
            nn.Linear(784, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, 10)
        ).to(device)

        optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    elif CLASSIFIER == "CNN":
        filters1 = trial.suggest_int("filters1", 16, 64, step=16)
        filters2 = trial.suggest_int("filters2", 32, 128, step=32)
        kernel1 = trial.suggest_int("kernel1", 3, 5)
        kernel2 = trial.suggest_int("kernel2", 3, 5)
        learning_rate = trial.suggest_float("learning_rate", 0.0001, 0.0025)

        mlflow.log_param("classifier", "CNN")
        mlflow.log_param("filters1", filters1)
        mlflow.log_param("filters2", filters2)
        mlflow.log_param("kernel1", kernel1)
        mlflow.log_param("kernel2", kernel2)
        mlflow.log_param("learning_rate", learning_rate)

        model = FashionCNN(filters1, filters2, kernel1, kernel2).to(device)
        optimizer = optim.Adam(model.parameters(), lr=learning_rate)

      
    elif CLASSIFIER == "LogisticRegression":
        mlflow.log_param("classifier", "LogisticRegression")
    
        # Prepare data for Logistic Regression (Flatten 28x28 images to 784 features)
        train_features = train_dataset.data.view(-1, 784).numpy()
        train_labels = train_dataset.targets.numpy()
        test_features = test_dataset.data.view(-1, 784).numpy()
        test_labels = test_dataset.targets.numpy()
    
        # Normalize the pixel values to [0,1] for better convergence
        train_features = train_features / 255.0
        test_features = test_features / 255.0
    
    
        C = trial.suggest_float("C", 0.01, 10.0, log=True)  
        solver = "saga" 
    
        model = LogisticRegression(C=C, max_iter=num_classifier_epochs, solver=solver)
        model.fit(train_features, train_labels)
    
    
        predictions = model.predict(test_features)
        accuracy = accuracy_score(test_labels, predictions) * 100
        print(f"Logistic Regression Test Accuracy: {accuracy:.2f}%")
    
        mlflow.log_param("C", C)
        mlflow.log_metric("test_accuracy", accuracy)
        mlflow.end_run()
        return accuracy

    # Training Loop for FNN and CNN
    criterion = nn.CrossEntropyLoss()

    model.train()
    for epoch in range(num_classifier_epochs):
        running_loss = 0.0
        for images, labels in train_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images) if CLASSIFIER == "CNN" else model(images.view(images.size(0), -1))

            optimizer.zero_grad()
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()

        print(f"{CLASSIFIER} Epoch {epoch+1}: loss = {running_loss / len(train_loader):.4f}")

    # Model Evaluation
    model.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images) if CLASSIFIER == "CNN" else model(images.view(images.size(0), -1))
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    print(f"Test Accuracy: {accuracy:.2f}%")

    mlflow.log_metric("test_accuracy", accuracy)
    mlflow.end_run()
    return accuracy

if __name__ == "__main__":
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=50)
    print(f"Best Parameters for {CLASSIFIER}:", study.best_params)
    print("Best Accuracy:", study.best_value)
CNN Epoch 1: loss = 0.4459
CNN Epoch 2: loss = 0.3209
CNN Epoch 3: loss = 0.2842
CNN Epoch 4: loss = 0.2659
CNN Epoch 5: loss = 0.2506
Test Accuracy: 89.40%
CNN Epoch 1: loss = 0.5388
CNN Epoch 2: loss = 0.3492
CNN Epoch 3: loss = 0.3142
CNN Epoch 4: loss = 0.2938
CNN Epoch 5: loss = 0.2800
Test Accuracy: 88.96%
CNN Epoch 1: loss = 0.4648
CNN Epoch 2: loss = 0.3264
CNN Epoch 3: loss = 0.2895
CNN Epoch 4: loss = 0.2676
CNN Epoch 5: loss = 0.2531
Test Accuracy: 89.78%
CNN Epoch 1: loss = 0.4906
CNN Epoch 2: loss = 0.3351
CNN Epoch 3: loss = 0.2999
CNN Epoch 4: loss = 0.2811
CNN Epoch 5: loss = 0.2629
Test Accuracy: 89.03%
CNN Epoch 1: loss = 0.4291
CNN Epoch 2: loss = 0.3172
CNN Epoch 3: loss = 0.2838
CNN Epoch 4: loss = 0.2617
CNN Epoch 5: loss = 0.2467
Test Accuracy: 89.66%
CNN Epoch 1: loss = 0.5091
CNN Epoch 2: loss = 0.3187
CNN Epoch 3: loss = 0.2821
CNN Epoch 4: loss = 0.2590
CNN Epoch 5: loss = 0.2420
Test Accuracy: 89.68%
CNN Epoch 1: loss = 0.4141
CNN Epoch 2: loss = 0.3009
CNN Epoch 3: loss = 0.2651
CNN Epoch 4: loss = 0.2424
CNN Epoch 5: loss = 0.2198
Test Accuracy: 89.32%
CNN Epoch 1: loss = 0.4183
CNN Epoch 2: loss = 0.3024
CNN Epoch 3: loss = 0.2766
CNN Epoch 4: loss = 0.2573
CNN Epoch 5: loss = 0.2437
Test Accuracy: 90.47%
CNN Epoch 1: loss = 0.4590
CNN Epoch 2: loss = 0.3238
CNN Epoch 3: loss = 0.2902
CNN Epoch 4: loss = 0.2716
CNN Epoch 5: loss = 0.2520
Test Accuracy: 88.13%
CNN Epoch 1: loss = 0.5734
CNN Epoch 2: loss = 0.3621
CNN Epoch 3: loss = 0.3230
CNN Epoch 4: loss = 0.2993
CNN Epoch 5: loss = 0.2820
Test Accuracy: 88.08%
CNN Epoch 1: loss = 0.4208
CNN Epoch 2: loss = 0.3115
CNN Epoch 3: loss = 0.2802
CNN Epoch 4: loss = 0.2616
CNN Epoch 5: loss = 0.2460
Test Accuracy: 90.58%
CNN Epoch 1: loss = 0.4225
CNN Epoch 2: loss = 0.3127
CNN Epoch 3: loss = 0.2809
CNN Epoch 4: loss = 0.2579
CNN Epoch 5: loss = 0.2445
Test Accuracy: 90.05%
CNN Epoch 1: loss = 0.4258
CNN Epoch 2: loss = 0.3055
CNN Epoch 3: loss = 0.2735
CNN Epoch 4: loss = 0.2551
CNN Epoch 5: loss = 0.2393
Test Accuracy: 89.98%
CNN Epoch 1: loss = 0.4277
CNN Epoch 2: loss = 0.3230
CNN Epoch 3: loss = 0.2964
CNN Epoch 4: loss = 0.2788
CNN Epoch 5: loss = 0.2607
Test Accuracy: 89.48%
CNN Epoch 1: loss = 0.4129
CNN Epoch 2: loss = 0.2932
CNN Epoch 3: loss = 0.2609
CNN Epoch 4: loss = 0.2397
CNN Epoch 5: loss = 0.2227
Test Accuracy: 90.65%
CNN Epoch 1: loss = 0.4244
CNN Epoch 2: loss = 0.3110
CNN Epoch 3: loss = 0.2787
CNN Epoch 4: loss = 0.2589
CNN Epoch 5: loss = 0.2452
Test Accuracy: 89.68%
CNN Epoch 1: loss = 0.4181
CNN Epoch 2: loss = 0.3081
CNN Epoch 3: loss = 0.2683
CNN Epoch 4: loss = 0.2460
CNN Epoch 5: loss = 0.2263
Test Accuracy: 86.68%
CNN Epoch 1: loss = 0.4351
CNN Epoch 2: loss = 0.3152
CNN Epoch 3: loss = 0.2797
CNN Epoch 4: loss = 0.2622
CNN Epoch 5: loss = 0.2482
Test Accuracy: 90.30%
CNN Epoch 1: loss = 0.4282
CNN Epoch 2: loss = 0.3174
CNN Epoch 3: loss = 0.2850
CNN Epoch 4: loss = 0.2612
CNN Epoch 5: loss = 0.2418
Test Accuracy: 89.87%
CNN Epoch 1: loss = 0.4243
CNN Epoch 2: loss = 0.2980
CNN Epoch 3: loss = 0.2687
CNN Epoch 4: loss = 0.2436
CNN Epoch 5: loss = 0.2311
Test Accuracy: 89.60%
CNN Epoch 1: loss = 0.4278
CNN Epoch 2: loss = 0.3220
CNN Epoch 3: loss = 0.2899
CNN Epoch 4: loss = 0.2708
CNN Epoch 5: loss = 0.2591
Test Accuracy: 90.19%
CNN Epoch 1: loss = 0.6207
CNN Epoch 2: loss = 0.3593
CNN Epoch 3: loss = 0.3164
CNN Epoch 4: loss = 0.2932
CNN Epoch 5: loss = 0.2769
Test Accuracy: 89.05%
CNN Epoch 1: loss = 0.4243
CNN Epoch 2: loss = 0.3053
CNN Epoch 3: loss = 0.2752
CNN Epoch 4: loss = 0.2549
CNN Epoch 5: loss = 0.2399
Test Accuracy: 89.97%
CNN Epoch 1: loss = 0.4036
CNN Epoch 2: loss = 0.2978
CNN Epoch 3: loss = 0.2652
CNN Epoch 4: loss = 0.2453
CNN Epoch 5: loss = 0.2273
Test Accuracy: 89.91%
CNN Epoch 1: loss = 0.4250
CNN Epoch 2: loss = 0.3066
CNN Epoch 3: loss = 0.2737
CNN Epoch 4: loss = 0.2564
CNN Epoch 5: loss = 0.2368
Test Accuracy: 90.36%
CNN Epoch 1: loss = 0.4453
CNN Epoch 2: loss = 0.3026
CNN Epoch 3: loss = 0.2671
CNN Epoch 4: loss = 0.2466
CNN Epoch 5: loss = 0.2313
Test Accuracy: 90.39%
CNN Epoch 1: loss = 0.4377
CNN Epoch 2: loss = 0.3289
CNN Epoch 3: loss = 0.2952
CNN Epoch 4: loss = 0.2728
CNN Epoch 5: loss = 0.2583
Test Accuracy: 89.34%
CNN Epoch 1: loss = 0.4433
CNN Epoch 2: loss = 0.3197
CNN Epoch 3: loss = 0.2928
CNN Epoch 4: loss = 0.2720
CNN Epoch 5: loss = 0.2572
Test Accuracy: 89.94%
CNN Epoch 1: loss = 0.4085
CNN Epoch 2: loss = 0.2990
CNN Epoch 3: loss = 0.2639
CNN Epoch 4: loss = 0.2441
CNN Epoch 5: loss = 0.2246
Test Accuracy: 90.39%
CNN Epoch 1: loss = 0.4169
CNN Epoch 2: loss = 0.3150
CNN Epoch 3: loss = 0.2812
CNN Epoch 4: loss = 0.2646
CNN Epoch 5: loss = 0.2524
Test Accuracy: 90.35%
CNN Epoch 1: loss = 0.4181
CNN Epoch 2: loss = 0.2965
CNN Epoch 3: loss = 0.2608
CNN Epoch 4: loss = 0.2398
CNN Epoch 5: loss = 0.2247
Test Accuracy: 90.74%
CNN Epoch 1: loss = 0.4103
CNN Epoch 2: loss = 0.2944
CNN Epoch 3: loss = 0.2614
CNN Epoch 4: loss = 0.2370
CNN Epoch 5: loss = 0.2211
Test Accuracy: 90.40%
CNN Epoch 1: loss = 0.3929
CNN Epoch 2: loss = 0.2857
CNN Epoch 3: loss = 0.2511
CNN Epoch 4: loss = 0.2302
CNN Epoch 5: loss = 0.2127
Test Accuracy: 89.82%
CNN Epoch 1: loss = 0.4309
CNN Epoch 2: loss = 0.2981
CNN Epoch 3: loss = 0.2607
CNN Epoch 4: loss = 0.2415
CNN Epoch 5: loss = 0.2219
Test Accuracy: 89.77%
CNN Epoch 1: loss = 0.4311
CNN Epoch 2: loss = 0.3084
CNN Epoch 3: loss = 0.2741
CNN Epoch 4: loss = 0.2536
CNN Epoch 5: loss = 0.2355
Test Accuracy: 90.17%
CNN Epoch 1: loss = 0.4078
CNN Epoch 2: loss = 0.2937
CNN Epoch 3: loss = 0.2590
CNN Epoch 4: loss = 0.2374
CNN Epoch 5: loss = 0.2233
Test Accuracy: 90.14%
CNN Epoch 1: loss = 0.4216
CNN Epoch 2: loss = 0.3111
CNN Epoch 3: loss = 0.2863
CNN Epoch 4: loss = 0.2682
CNN Epoch 5: loss = 0.2535
Test Accuracy: 89.67%
CNN Epoch 1: loss = 0.4287
CNN Epoch 2: loss = 0.3083
CNN Epoch 3: loss = 0.2711
CNN Epoch 4: loss = 0.2472
CNN Epoch 5: loss = 0.2267
Test Accuracy: 90.13%
CNN Epoch 1: loss = 0.4703
CNN Epoch 2: loss = 0.3168
CNN Epoch 3: loss = 0.2823
CNN Epoch 4: loss = 0.2627
CNN Epoch 5: loss = 0.2475
Test Accuracy: 90.05%
CNN Epoch 1: loss = 0.4079
CNN Epoch 2: loss = 0.3016
CNN Epoch 3: loss = 0.2679
CNN Epoch 4: loss = 0.2482
CNN Epoch 5: loss = 0.2319
Test Accuracy: 89.92%
CNN Epoch 1: loss = 0.4604
CNN Epoch 2: loss = 0.3245
CNN Epoch 3: loss = 0.2931
CNN Epoch 4: loss = 0.2709
CNN Epoch 5: loss = 0.2537
Test Accuracy: 89.11%
CNN Epoch 1: loss = 0.4191
CNN Epoch 2: loss = 0.2951
CNN Epoch 3: loss = 0.2595
CNN Epoch 4: loss = 0.2364
CNN Epoch 5: loss = 0.2207
Test Accuracy: 90.52%
CNN Epoch 1: loss = 0.4055
CNN Epoch 2: loss = 0.2928
CNN Epoch 3: loss = 0.2578
CNN Epoch 4: loss = 0.2324
CNN Epoch 5: loss = 0.2168
Test Accuracy: 90.19%
CNN Epoch 1: loss = 0.4093
CNN Epoch 2: loss = 0.2920
CNN Epoch 3: loss = 0.2587
CNN Epoch 4: loss = 0.2373
CNN Epoch 5: loss = 0.2208
Test Accuracy: 90.53%
CNN Epoch 1: loss = 0.4152
CNN Epoch 2: loss = 0.2914
CNN Epoch 3: loss = 0.2578
CNN Epoch 4: loss = 0.2355
CNN Epoch 5: loss = 0.2206
Test Accuracy: 90.67%
CNN Epoch 1: loss = 0.4248
CNN Epoch 2: loss = 0.2987
CNN Epoch 3: loss = 0.2629
CNN Epoch 4: loss = 0.2435
CNN Epoch 5: loss = 0.2227
Test Accuracy: 89.14%
CNN Epoch 1: loss = 0.4502
CNN Epoch 2: loss = 0.2915
CNN Epoch 3: loss = 0.2591
CNN Epoch 4: loss = 0.2389
CNN Epoch 5: loss = 0.2204
Test Accuracy: 90.57%
CNN Epoch 1: loss = 0.4698
CNN Epoch 2: loss = 0.3063
CNN Epoch 3: loss = 0.2672
CNN Epoch 4: loss = 0.2466
CNN Epoch 5: loss = 0.2296
Test Accuracy: 89.47%
CNN Epoch 1: loss = 0.4390
CNN Epoch 2: loss = 0.3100
CNN Epoch 3: loss = 0.2751
CNN Epoch 4: loss = 0.2546
CNN Epoch 5: loss = 0.2362
Test Accuracy: 90.94%
CNN Epoch 1: loss = 0.4535
CNN Epoch 2: loss = 0.3380
CNN Epoch 3: loss = 0.3062
CNN Epoch 4: loss = 0.2846
CNN Epoch 5: loss = 0.2721
Test Accuracy: 89.14%
Best Parameters for CNN: {'batch_size': 224, 'num_classifier_epochs': 5, 'filters1': 16, 'filters2': 96, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0023025013850297453}
Best Accuracy: 90.94

[I 2025-03-16 08:59:30,243] A new study created in memory with name: no-name-4606b929-76d9-485f-bf21-f9874cdb7f34
[I 2025-03-16 09:00:10,460] Trial 0 finished with value: 89.4 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'filters1': 64, 'filters2': 32, 'kernel1': 5, 'kernel2': 4, 'learning_rate': 0.000873907265195191}. Best is trial 0 with value: 89.4.
[I 2025-03-16 09:00:35,346] Trial 1 finished with value: 88.96 and parameters: {'batch_size': 256, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 64, 'kernel1': 5, 'kernel2': 3, 'learning_rate': 0.0007580754538643846}. Best is trial 0 with value: 89.4.
[I 2025-03-16 09:01:04,367] Trial 2 finished with value: 89.78 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'filters1': 16, 'filters2': 96, 'kernel1': 5, 'kernel2': 4, 'learning_rate': 0.0009526881832748308}. Best is trial 2 with value: 89.78.
[I 2025-03-16 09:01:29,943] Trial 3 finished with value: 89.03 and parameters: {'batch_size': 224, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 32, 'kernel1': 5, 'kernel2': 5, 'learning_rate': 0.0009918454367704034}. Best is trial 2 with value: 89.78.
[I 2025-03-16 09:02:08,341] Trial 4 finished with value: 89.66 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 64, 'kernel1': 4, 'kernel2': 4, 'learning_rate': 0.001371605451861138}. Best is trial 2 with value: 89.78.
[I 2025-03-16 09:02:41,003] Trial 5 finished with value: 89.68 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'filters1': 16, 'filters2': 96, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0004184987681443148}. Best is trial 2 with value: 89.78.
[I 2025-03-16 09:03:43,173] Trial 6 finished with value: 89.32 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'filters1': 64, 'filters2': 128, 'kernel1': 4, 'kernel2': 5, 'learning_rate': 0.0009470373597155892}. Best is trial 2 with value: 89.78.
[I 2025-03-16 09:04:33,800] Trial 7 finished with value: 90.47 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'filters1': 48, 'filters2': 32, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0008178287967451841}. Best is trial 7 with value: 90.47.
[I 2025-03-16 09:05:02,381] Trial 8 finished with value: 88.13 and parameters: {'batch_size': 256, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 128, 'kernel1': 5, 'kernel2': 3, 'learning_rate': 0.0016686432490260414}. Best is trial 7 with value: 90.47.
[I 2025-03-16 09:05:29,035] Trial 9 finished with value: 88.08 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'filters1': 16, 'filters2': 64, 'kernel1': 4, 'kernel2': 5, 'learning_rate': 0.0004531650144436982}. Best is trial 7 with value: 90.47.
[I 2025-03-16 09:06:19,251] Trial 10 finished with value: 90.58 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'filters1': 48, 'filters2': 32, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.002231507731750756}. Best is trial 10 with value: 90.58.
[I 2025-03-16 09:07:09,466] Trial 11 finished with value: 90.05 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'filters1': 48, 'filters2': 32, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.00227114029071854}. Best is trial 10 with value: 90.58.
[I 2025-03-16 09:07:43,062] Trial 12 finished with value: 89.98 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'filters1': 48, 'filters2': 32, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.002383195306809059}. Best is trial 10 with value: 90.58.
[I 2025-03-16 09:08:34,169] Trial 13 finished with value: 89.48 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'filters1': 48, 'filters2': 32, 'kernel1': 3, 'kernel2': 4, 'learning_rate': 0.0018716482018662718}. Best is trial 10 with value: 90.58.
[I 2025-03-16 09:09:19,527] Trial 14 finished with value: 90.65 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'filters1': 64, 'filters2': 64, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0019514197471730273}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:09:59,613] Trial 15 finished with value: 89.68 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'filters1': 64, 'filters2': 64, 'kernel1': 3, 'kernel2': 4, 'learning_rate': 0.0020112284477016465}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:10:50,237] Trial 16 finished with value: 86.68 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'filters1': 64, 'filters2': 96, 'kernel1': 4, 'kernel2': 5, 'learning_rate': 0.0015082186210862967}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:11:25,032] Trial 17 finished with value: 90.3 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'filters1': 64, 'filters2': 64, 'kernel1': 3, 'kernel2': 3, 'learning_rate': 0.00210214321388737}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:12:05,184] Trial 18 finished with value: 89.87 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'filters1': 48, 'filters2': 64, 'kernel1': 4, 'kernel2': 5, 'learning_rate': 0.00244848681457679}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:12:49,529] Trial 19 finished with value: 89.6 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'filters1': 64, 'filters2': 96, 'kernel1': 3, 'kernel2': 4, 'learning_rate': 0.0017940949606504452}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:13:28,409] Trial 20 finished with value: 90.19 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'filters1': 48, 'filters2': 32, 'kernel1': 4, 'kernel2': 4, 'learning_rate': 0.00214573396583234}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:14:18,620] Trial 21 finished with value: 89.05 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'filters1': 48, 'filters2': 32, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.00010369441984316098}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:15:08,711] Trial 22 finished with value: 89.97 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'filters1': 48, 'filters2': 32, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.001286598436902601}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:15:59,959] Trial 23 finished with value: 89.91 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'filters1': 48, 'filters2': 64, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0013175625064431556}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:16:36,091] Trial 24 finished with value: 90.36 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'filters1': 64, 'filters2': 32, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0016416700032755923}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:17:15,025] Trial 25 finished with value: 90.39 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 64, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0006184790046266983}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:18:05,695] Trial 26 finished with value: 89.34 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'filters1': 48, 'filters2': 32, 'kernel1': 4, 'kernel2': 5, 'learning_rate': 0.0010947742303948772}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:18:38,578] Trial 27 finished with value: 89.94 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'filters1': 64, 'filters2': 32, 'kernel1': 3, 'kernel2': 4, 'learning_rate': 0.001896006832633419}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:19:20,102] Trial 28 finished with value: 90.39 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'filters1': 48, 'filters2': 64, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0022756460061371087}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:20:11,064] Trial 29 finished with value: 90.35 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'filters1': 64, 'filters2': 32, 'kernel1': 4, 'kernel2': 4, 'learning_rate': 0.0011866705908714923}. Best is trial 14 with value: 90.65.
[I 2025-03-16 09:20:46,432] Trial 30 finished with value: 90.74 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 96, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.001587497193518987}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:21:21,713] Trial 31 finished with value: 90.4 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 96, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0015386982669724858}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:22:03,884] Trial 32 finished with value: 89.82 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 128, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.002019771066476629}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:22:43,826] Trial 33 finished with value: 89.77 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 96, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0007557272099007155}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:23:16,182] Trial 34 finished with value: 90.17 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'filters1': 16, 'filters2': 96, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0017359847176557433}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:24:07,725] Trial 35 finished with value: 90.14 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 96, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0014444617929365465}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:24:45,924] Trial 36 finished with value: 89.67 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 64, 'kernel1': 3, 'kernel2': 3, 'learning_rate': 0.0021941917926248593}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:25:34,853] Trial 37 finished with value: 90.13 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'filters1': 48, 'filters2': 128, 'kernel1': 4, 'kernel2': 5, 'learning_rate': 0.0019426155400975975}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:26:03,561] Trial 38 finished with value: 90.05 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 64, 'kernel1': 5, 'kernel2': 4, 'learning_rate': 0.0007664797921908298}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:26:53,868] Trial 39 finished with value: 89.92 and parameters: {'batch_size': 64, 'num_classifier_epochs': 5, 'filters1': 16, 'filters2': 96, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0016082746524008341}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:27:27,990] Trial 40 finished with value: 89.11 and parameters: {'batch_size': 192, 'num_classifier_epochs': 5, 'filters1': 64, 'filters2': 32, 'kernel1': 4, 'kernel2': 5, 'learning_rate': 0.0024946814833924696}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:28:03,169] Trial 41 finished with value: 90.52 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 96, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.001533107578546041}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:28:38,281] Trial 42 finished with value: 90.19 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 96, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0017803409863100695}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:29:18,354] Trial 43 finished with value: 90.53 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 96, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.001115266589122292}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:29:53,571] Trial 44 finished with value: 90.67 and parameters: {'batch_size': 128, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 96, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0011552592020765175}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:30:32,595] Trial 45 finished with value: 89.14 and parameters: {'batch_size': 96, 'num_classifier_epochs': 5, 'filters1': 16, 'filters2': 128, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0010945786397326387}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:31:05,522] Trial 46 finished with value: 90.57 and parameters: {'batch_size': 224, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 96, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0009603389252030184}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:31:37,617] Trial 47 finished with value: 89.47 and parameters: {'batch_size': 256, 'num_classifier_epochs': 5, 'filters1': 32, 'filters2': 96, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0009250594010642777}. Best is trial 30 with value: 90.74.
[I 2025-03-16 09:32:03,202] Trial 48 finished with value: 90.94 and parameters: {'batch_size': 224, 'num_classifier_epochs': 5, 'filters1': 16, 'filters2': 96, 'kernel1': 3, 'kernel2': 5, 'learning_rate': 0.0023025013850297453}. Best is trial 48 with value: 90.94.
[I 2025-03-16 09:32:31,745] Trial 49 finished with value: 89.14 and parameters: {'batch_size': 160, 'num_classifier_epochs': 5, 'filters1': 16, 'filters2': 128, 'kernel1': 5, 'kernel2': 3, 'learning_rate': 0.0023168633027569525}. Best is trial 48 with value: 90.94.

Test Accuracy Based on the Number of Filters in the First Conv2D Layer


Test Accuracy Based on the Number of Filters in the Second Conv2D Layer

Test Accuracy Based on Kernel Size in the First Conv2D Layer

Test Accuracy Based on Kernel Size in the Second Conv2D Layer

Model 4: Logistic Regression on RBM Hidden Features (of Fashion MNIST Data)

Click to Show Code and Output
Code
CLASSIFIER = 'LogisticRegression'

if CLASSIFIER == 'LogisticRegression':
    experiment = mlflow.set_experiment("pytorch-fmnist-lr-withrbm")
else:
    experiment = mlflow.set_experiment("pytorch-fmnist-fnn-withrbm")


class RBM(nn.Module):
    def __init__(self, n_visible=784, n_hidden=256, k=1):
        super(RBM, self).__init__()
        self.n_visible = n_visible
        self.n_hidden = n_hidden
        # Initialize weights and biases
        self.W = nn.Parameter(torch.randn(n_hidden, n_visible) * 0.1)
        self.v_bias = nn.Parameter(torch.zeros(n_visible))
        self.h_bias = nn.Parameter(torch.zeros(n_hidden))
        self.k = k  # CD-k steps

    def sample_h(self, v):
        # Given visible v, sample hidden h
        p_h = torch.sigmoid(F.linear(v, self.W, self.h_bias))  # p(h=1|v)
        h_sample = torch.bernoulli(p_h)                        # sample Bernoulli
        return p_h, h_sample

    def sample_v(self, h):
        # Given hidden h, sample visible v
        p_v = torch.sigmoid(F.linear(h, self.W.t(), self.v_bias))  # p(v=1|h)
        v_sample = torch.bernoulli(p_v)
        return p_v, v_sample

    def forward(self, v):
        # Perform k steps of contrastive divergence starting from v
        v_k = v.clone()
        for _ in range(self.k):
            _, h_k = self.sample_h(v_k)    # sample hidden from current visible
            _, v_k = self.sample_v(h_k)    # sample visible from hidden
        return v_k  # k-step reconstructed visible

    def free_energy(self, v):
        # Compute the visible bias term for each sample in the batch
        vbias_term = (v * self.v_bias).sum(dim=1)  # shape: [batch_size]
        # Compute the activation of the hidden units
        wx_b = F.linear(v, self.W, self.h_bias)     # shape: [batch_size, n_hidden]
        # Compute the hidden term
        hidden_term = torch.sum(torch.log1p(torch.exp(wx_b)), dim=1)  # shape: [batch_size]
        # Return the mean free energy over the batch
        return - (vbias_term + hidden_term).mean()
    
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.FashionMNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.FashionMNIST(root='./data', train=False, transform=transform, download=True)

def objective(trial):
    num_rbm_epochs = trial.suggest_int("num_rbm_epochs", 5, 5)# 24, 33)
    batch_size = trial.suggest_int("batch_size", 192, 1024)
    rbm_lr = trial.suggest_float("rbm_lr", 0.05, 0.1)
    rbm_hidden = trial.suggest_int("rbm_hidden", 384, 8192)

    mlflow.start_run(experiment_id=experiment.experiment_id)
    if CLASSIFIER != 'LogisticRegression':
        fnn_hidden = trial.suggest_int("fnn_hidden", 192, 384)
        fnn_lr = trial.suggest_float("fnn_lr", 0.0001, 0.0025)
        mlflow.log_param("fnn_hidden", fnn_hidden)
        mlflow.log_param("fnn_lr", fnn_lr)

    num_classifier_epochs = trial.suggest_int("num_classifier_epochs", 5, 5)# 40, 60)

    mlflow.log_param("num_rbm_epochs", num_rbm_epochs)
    mlflow.log_param("batch_size", batch_size)
    mlflow.log_param("rbm_lr", rbm_lr)
    mlflow.log_param("rbm_hidden", rbm_hidden)
    mlflow.log_param("num_classifier_epochs", num_classifier_epochs)

    # Instantiate RBM and optimizer
    device = torch.device("mps")
    rbm = RBM(n_visible=784, n_hidden=rbm_hidden, k=1).to(device)
    optimizer = torch.optim.SGD(rbm.parameters(), lr=rbm_lr)

    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

    rbm_training_failed = False
    # Training loop (assuming train_loader yields batches of images and labels)
    for epoch in range(num_rbm_epochs):
        total_loss = 0.0
        for images, _ in train_loader:
            # Flatten images and binarize
            v0 = images.view(-1, 784).to(rbm.W.device)      # shape [batch_size, 784]
            v0 = torch.bernoulli(v0)                        # sample binary input
            vk = rbm(v0)                                    # k-step CD reconstruction
            # Compute contrastive divergence loss (free energy difference)
            loss = rbm.free_energy(v0) - rbm.free_energy(vk)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        print(f"Epoch {epoch+1}: avg free-energy loss = {total_loss/len(train_loader):.4f}")
        if np.isnan(total_loss):
            rbm_training_failed = True
            break

    if rbm_training_failed:
        accuracy = 0.0
    else:
        rbm.eval()  # set in evaluation mode if using any layers that behave differently in training
        features_list = []
        labels_list = []
        for images, labels in train_loader:
            v = images.view(-1, 784).to(rbm.W.device)
            v = v  # (optionally binarize or use raw normalized pixels)
            h_prob, h_sample = rbm.sample_h(v)  # get hidden activations
            features_list.append(h_prob.cpu().detach().numpy())
            labels_list.append(labels.numpy())
        train_features = np.concatenate(features_list)  # shape: [N_train, n_hidden]
        train_labels = np.concatenate(labels_list)

        # Convert pre-extracted training features and labels to tensors and create a DataLoader
        train_features_tensor = torch.tensor(train_features, dtype=torch.float32)
        train_labels_tensor = torch.tensor(train_labels, dtype=torch.long)
        train_feature_dataset = torch.utils.data.TensorDataset(train_features_tensor, train_labels_tensor)
        train_feature_loader = torch.utils.data.DataLoader(train_feature_dataset, batch_size=batch_size, shuffle=True)

            
        if CLASSIFIER == 'LogisticRegression':
            # add optuna tuning same as log reg without RBM features...
            lr_C = trial.suggest_float("lr_C", 0.01, 10.0, log=True)  
            mlflow.log_param("lr_C", lr_C)  # Log the chosen C value

            classifier = LogisticRegression(max_iter=num_classifier_epochs, C=lr_C, solver="saga") 
            classifier.fit(train_features, train_labels)            
            
        else:
            classifier = nn.Sequential(
                nn.Linear(rbm.n_hidden, fnn_hidden),
                nn.ReLU(),
                nn.Linear(fnn_hidden, 10)
            )

            # Move classifier to the same device as the RBM
            classifier = classifier.to(device)
            criterion = nn.CrossEntropyLoss()
            classifier_optimizer = torch.optim.Adam(classifier.parameters(), lr=fnn_lr)

            classifier.train()
            for epoch in range(num_classifier_epochs):
                running_loss = 0.0
                for features, labels in train_feature_loader:
                    features = features.to(device)
                    labels = labels.to(device)
                    
                    # Forward pass through classifier
                    outputs = classifier(features)
                    loss = criterion(outputs, labels)
                    
                    # Backpropagation and optimization
                    classifier_optimizer.zero_grad()
                    loss.backward()
                    classifier_optimizer.step()
                    
                    running_loss += loss.item()
                avg_loss = running_loss / len(train_feature_loader)
                print(f"Classifier Epoch {epoch+1}: loss = {avg_loss:.4f}")

        # Evaluate the classifier on test data.
        # Here we extract features from the RBM for each test image.
        if CLASSIFIER != 'LogisticRegression':
            classifier.eval()
            correct = 0
            total = 0
        features_list = []
        labels_list = []
        with torch.no_grad():
            for images, labels in test_loader:
                v = images.view(-1, 784).to(device)
                # Extract hidden activations; you can use either h_prob or h_sample.
                h_prob, _ = rbm.sample_h(v)
                if CLASSIFIER == 'LogisticRegression':
                    features_list.append(h_prob.cpu().detach().numpy())
                    labels_list.append(labels.numpy())
                else:
                    outputs = classifier(h_prob)
                    _, predicted = torch.max(outputs.data, 1)
                    total += labels.size(0)
                    correct += (predicted.cpu() == labels).sum().item()

        if CLASSIFIER == 'LogisticRegression':
            test_features = np.concatenate(features_list)
            test_labels = np.concatenate(labels_list)
            predictions = classifier.predict(test_features)
            accuracy = accuracy_score(test_labels, predictions) * 100
        else:
            accuracy = 100 * correct / total

        print(f"Test Accuracy: {accuracy:.2f}%")

    mlflow.log_metric("test_accuracy", accuracy)
    mlflow.end_run()

    return accuracy

if __name__ == "__main__":
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=50)
    print(study.best_params)
    print(study.best_value)
    print(study.best_trial)
Epoch 1: avg free-energy loss = 11.0983
Epoch 2: avg free-energy loss = 0.8902
Epoch 3: avg free-energy loss = 0.2075
Epoch 4: avg free-energy loss = -0.0848
Epoch 5: avg free-energy loss = -0.1905
Test Accuracy: 86.31%
Epoch 1: avg free-energy loss = 286.4975
Epoch 2: avg free-energy loss = 77.2576
Epoch 3: avg free-energy loss = 51.3040
Epoch 4: avg free-energy loss = 38.3931
Epoch 5: avg free-energy loss = 32.1150
Test Accuracy: 86.17%
Epoch 1: avg free-energy loss = 8.7194
Epoch 2: avg free-energy loss = -4.1874
Epoch 3: avg free-energy loss = -3.7265
Epoch 4: avg free-energy loss = -3.6580
Epoch 5: avg free-energy loss = -3.6526
Test Accuracy: 84.77%
Epoch 1: avg free-energy loss = 445.2913
Epoch 2: avg free-energy loss = 136.5778
Epoch 3: avg free-energy loss = 94.3315
Epoch 4: avg free-energy loss = 69.3253
Epoch 5: avg free-energy loss = 57.2865
Test Accuracy: 86.33%
Epoch 1: avg free-energy loss = 226.6175
Epoch 2: avg free-energy loss = 57.4454
Epoch 3: avg free-energy loss = 37.1607
Epoch 4: avg free-energy loss = 28.3023
Epoch 5: avg free-energy loss = 22.9894
Test Accuracy: 85.95%
Epoch 1: avg free-energy loss = 42.4242
Epoch 2: avg free-energy loss = 6.9989
Epoch 3: avg free-energy loss = 5.5995
Epoch 4: avg free-energy loss = 4.3436
Epoch 5: avg free-energy loss = 3.4054
Test Accuracy: 86.42%
Epoch 1: avg free-energy loss = 208.0187
Epoch 2: avg free-energy loss = 55.5192
Epoch 3: avg free-energy loss = 38.6505
Epoch 4: avg free-energy loss = 29.5319
Epoch 5: avg free-energy loss = 25.2174
Test Accuracy: 86.22%
Epoch 1: avg free-energy loss = 162.0240
Epoch 2: avg free-energy loss = 40.0159
Epoch 3: avg free-energy loss = 27.6307
Epoch 4: avg free-energy loss = 21.5389
Epoch 5: avg free-energy loss = 18.3828
Test Accuracy: 86.14%
Epoch 1: avg free-energy loss = 63.8246
Epoch 2: avg free-energy loss = 15.7824
Epoch 3: avg free-energy loss = 10.4482
Epoch 4: avg free-energy loss = 8.0529
Epoch 5: avg free-energy loss = 6.8513
Test Accuracy: 86.66%
Epoch 1: avg free-energy loss = -5.4026
Epoch 2: avg free-energy loss = -15.3499
Epoch 3: avg free-energy loss = -13.0663
Epoch 4: avg free-energy loss = -11.6247
Epoch 5: avg free-energy loss = -10.7206
Test Accuracy: 83.87%
Epoch 1: avg free-energy loss = 62.0276
Epoch 2: avg free-energy loss = 19.2369
Epoch 3: avg free-energy loss = 14.2759
Epoch 4: avg free-energy loss = 11.6409
Epoch 5: avg free-energy loss = 9.7695
Test Accuracy: 86.89%
Epoch 1: avg free-energy loss = 59.6724
Epoch 2: avg free-energy loss = 17.9591
Epoch 3: avg free-energy loss = 13.0873
Epoch 4: avg free-energy loss = 10.7980
Epoch 5: avg free-energy loss = 9.1732
Test Accuracy: 86.81%
Epoch 1: avg free-energy loss = 69.2635
Epoch 2: avg free-energy loss = 21.3186
Epoch 3: avg free-energy loss = 15.6044
Epoch 4: avg free-energy loss = 12.8041
Epoch 5: avg free-energy loss = 11.0498
Test Accuracy: 86.78%
Epoch 1: avg free-energy loss = 71.0654
Epoch 2: avg free-energy loss = 20.9858
Epoch 3: avg free-energy loss = 14.9994
Epoch 4: avg free-energy loss = 12.1078
Epoch 5: avg free-energy loss = 10.0989
Test Accuracy: 86.69%
Epoch 1: avg free-energy loss = 68.4150
Epoch 2: avg free-energy loss = 19.5770
Epoch 3: avg free-energy loss = 13.7867
Epoch 4: avg free-energy loss = 11.1159
Epoch 5: avg free-energy loss = 9.4838
Test Accuracy: 86.59%
Epoch 1: avg free-energy loss = 170.5432
Epoch 2: avg free-energy loss = 49.2767
Epoch 3: avg free-energy loss = 32.4581
Epoch 4: avg free-energy loss = 24.9453
Epoch 5: avg free-energy loss = 20.5663
Test Accuracy: 86.39%
Epoch 1: avg free-energy loss = 176.3080
Epoch 2: avg free-energy loss = 51.5368
Epoch 3: avg free-energy loss = 34.5170
Epoch 4: avg free-energy loss = 25.8527
Epoch 5: avg free-energy loss = 21.3832
Test Accuracy: 86.15%
Epoch 1: avg free-energy loss = 251.0024
Epoch 2: avg free-energy loss = 61.2535
Epoch 3: avg free-energy loss = 40.6592
Epoch 4: avg free-energy loss = 31.2130
Epoch 5: avg free-energy loss = 25.9338
Test Accuracy: 86.40%
Epoch 1: avg free-energy loss = 54.3958
Epoch 2: avg free-energy loss = 14.7653
Epoch 3: avg free-energy loss = 10.2451
Epoch 4: avg free-energy loss = 8.2519
Epoch 5: avg free-energy loss = 7.1040
Test Accuracy: 86.52%
Epoch 1: avg free-energy loss = 121.2490
Epoch 2: avg free-energy loss = 34.7991
Epoch 3: avg free-energy loss = 23.8062
Epoch 4: avg free-energy loss = 18.9202
Epoch 5: avg free-energy loss = 16.0589
Test Accuracy: 86.41%
Epoch 1: avg free-energy loss = 118.5108
Epoch 2: avg free-energy loss = 32.6930
Epoch 3: avg free-energy loss = 21.5124
Epoch 4: avg free-energy loss = 15.4574
Epoch 5: avg free-energy loss = 12.5273
Test Accuracy: 86.62%
Epoch 1: avg free-energy loss = 75.2110
Epoch 2: avg free-energy loss = 23.5414
Epoch 3: avg free-energy loss = 17.5267
Epoch 4: avg free-energy loss = 14.3113
Epoch 5: avg free-energy loss = 12.2548
Test Accuracy: 86.82%
Epoch 1: avg free-energy loss = 73.3781
Epoch 2: avg free-energy loss = 23.6767
Epoch 3: avg free-energy loss = 17.8470
Epoch 4: avg free-energy loss = 14.7799
Epoch 5: avg free-energy loss = 12.7191
Test Accuracy: 86.77%
Epoch 1: avg free-energy loss = 128.2982
Epoch 2: avg free-energy loss = 37.1548
Epoch 3: avg free-energy loss = 25.4427
Epoch 4: avg free-energy loss = 20.0800
Epoch 5: avg free-energy loss = 17.0188
Test Accuracy: 86.36%
Epoch 1: avg free-energy loss = 59.5652
Epoch 2: avg free-energy loss = 16.4695
Epoch 3: avg free-energy loss = 11.3858
Epoch 4: avg free-energy loss = 9.4491
Epoch 5: avg free-energy loss = 8.0241
Test Accuracy: 86.73%
Epoch 1: avg free-energy loss = 90.3097
Epoch 2: avg free-energy loss = 26.4820
Epoch 3: avg free-energy loss = 18.5732
Epoch 4: avg free-energy loss = 15.0498
Epoch 5: avg free-energy loss = 12.7559
Test Accuracy: 86.52%
Epoch 1: avg free-energy loss = -15.1602
Epoch 2: avg free-energy loss = -12.0583
Epoch 3: avg free-energy loss = -9.6985
Epoch 4: avg free-energy loss = -7.9796
Epoch 5: avg free-energy loss = -6.4301
Test Accuracy: 85.51%
Epoch 1: avg free-energy loss = 42.3860
Epoch 2: avg free-energy loss = 9.7808
Epoch 3: avg free-energy loss = 6.1572
Epoch 4: avg free-energy loss = 4.5713
Epoch 5: avg free-energy loss = 3.7257
Test Accuracy: 86.03%
Epoch 1: avg free-energy loss = 81.7266
Epoch 2: avg free-energy loss = 21.4086
Epoch 3: avg free-energy loss = 14.7284
Epoch 4: avg free-energy loss = 11.7900
Epoch 5: avg free-energy loss = 9.8858
Test Accuracy: 86.49%
Epoch 1: avg free-energy loss = 20.0578
Epoch 2: avg free-energy loss = 3.7557
Epoch 3: avg free-energy loss = 2.0682
Epoch 4: avg free-energy loss = 1.5203
Epoch 5: avg free-energy loss = 1.1687
Test Accuracy: 84.50%
Epoch 1: avg free-energy loss = 134.6919
Epoch 2: avg free-energy loss = 37.8646
Epoch 3: avg free-energy loss = 26.5401
Epoch 4: avg free-energy loss = 21.3446
Epoch 5: avg free-energy loss = 18.3865
Test Accuracy: 86.38%
Epoch 1: avg free-energy loss = 70.4801
Epoch 2: avg free-energy loss = 21.6718
Epoch 3: avg free-energy loss = 15.8365
Epoch 4: avg free-energy loss = 13.1679
Epoch 5: avg free-energy loss = 11.2656
Test Accuracy: 86.77%
Epoch 1: avg free-energy loss = 108.5772
Epoch 2: avg free-energy loss = 31.0614
Epoch 3: avg free-energy loss = 21.3749
Epoch 4: avg free-energy loss = 17.0071
Epoch 5: avg free-energy loss = 14.4444
Test Accuracy: 86.46%
Epoch 1: avg free-energy loss = 153.8949
Epoch 2: avg free-energy loss = 42.8469
Epoch 3: avg free-energy loss = 29.0603
Epoch 4: avg free-energy loss = 23.3826
Epoch 5: avg free-energy loss = 19.2631
Test Accuracy: 86.61%
Epoch 1: avg free-energy loss = 47.5472
Epoch 2: avg free-energy loss = 13.4522
Epoch 3: avg free-energy loss = 9.3553
Epoch 4: avg free-energy loss = 7.3195
Epoch 5: avg free-energy loss = 6.0742
Test Accuracy: 86.46%
Epoch 1: avg free-energy loss = 73.2760
Epoch 2: avg free-energy loss = 20.8476
Epoch 3: avg free-energy loss = 15.1852
Epoch 4: avg free-energy loss = 12.5364
Epoch 5: avg free-energy loss = 11.0951
Test Accuracy: 86.43%
Epoch 1: avg free-energy loss = 175.5370
Epoch 2: avg free-energy loss = 52.3486
Epoch 3: avg free-energy loss = 34.1365
Epoch 4: avg free-energy loss = 25.9829
Epoch 5: avg free-energy loss = 21.1861
Test Accuracy: 86.05%
Epoch 1: avg free-energy loss = 36.7464
Epoch 2: avg free-energy loss = 10.3793
Epoch 3: avg free-energy loss = 7.6167
Epoch 4: avg free-energy loss = 6.3552
Epoch 5: avg free-energy loss = 5.4945
Test Accuracy: 86.38%
Epoch 1: avg free-energy loss = 178.4982
Epoch 2: avg free-energy loss = 50.6123
Epoch 3: avg free-energy loss = 33.0193
Epoch 4: avg free-energy loss = 25.1056
Epoch 5: avg free-energy loss = 20.3579
Test Accuracy: 86.35%
Epoch 1: avg free-energy loss = 147.5758
Epoch 2: avg free-energy loss = 43.5731
Epoch 3: avg free-energy loss = 30.1178
Epoch 4: avg free-energy loss = 24.4351
Epoch 5: avg free-energy loss = 21.0490
Test Accuracy: 86.39%
Epoch 1: avg free-energy loss = 16.3335
Epoch 2: avg free-energy loss = 1.5618
Epoch 3: avg free-energy loss = 0.2115
Epoch 4: avg free-energy loss = -0.2513
Epoch 5: avg free-energy loss = -0.4366
Test Accuracy: 86.16%
Epoch 1: avg free-energy loss = 66.2670
Epoch 2: avg free-energy loss = 21.0264
Epoch 3: avg free-energy loss = 15.6072
Epoch 4: avg free-energy loss = 13.0578
Epoch 5: avg free-energy loss = 11.3276
Test Accuracy: 86.90%
Epoch 1: avg free-energy loss = 78.8883
Epoch 2: avg free-energy loss = 24.7488
Epoch 3: avg free-energy loss = 18.3049
Epoch 4: avg free-energy loss = 15.0500
Epoch 5: avg free-energy loss = 12.8224
Test Accuracy: 86.73%
Epoch 1: avg free-energy loss = 58.9275
Epoch 2: avg free-energy loss = 16.7089
Epoch 3: avg free-energy loss = 11.9911
Epoch 4: avg free-energy loss = 9.9277
Epoch 5: avg free-energy loss = 8.6075
Test Accuracy: 86.52%
Epoch 1: avg free-energy loss = 136.8138
Epoch 2: avg free-energy loss = 36.0843
Epoch 3: avg free-energy loss = 24.0204
Epoch 4: avg free-energy loss = 17.8956
Epoch 5: avg free-energy loss = 14.2844
Test Accuracy: 86.14%
Epoch 1: avg free-energy loss = 77.1562
Epoch 2: avg free-energy loss = 24.7295
Epoch 3: avg free-energy loss = 18.4784
Epoch 4: avg free-energy loss = 15.5181
Epoch 5: avg free-energy loss = 13.7065
Test Accuracy: 86.81%
Epoch 1: avg free-energy loss = 115.9389
Epoch 2: avg free-energy loss = 32.9406
Epoch 3: avg free-energy loss = 24.0169
Epoch 4: avg free-energy loss = 19.7586
Epoch 5: avg free-energy loss = 17.3623
Test Accuracy: 86.49%
Epoch 1: avg free-energy loss = 85.4472
Epoch 2: avg free-energy loss = 27.6156
Epoch 3: avg free-energy loss = 20.8374
Epoch 4: avg free-energy loss = 17.3925
Epoch 5: avg free-energy loss = 15.0158
Test Accuracy: 86.62%
Epoch 1: avg free-energy loss = 146.4544
Epoch 2: avg free-energy loss = 41.5493
Epoch 3: avg free-energy loss = 25.9868
Epoch 4: avg free-energy loss = 20.3851
Epoch 5: avg free-energy loss = 17.0355
Test Accuracy: 86.25%
Epoch 1: avg free-energy loss = 17.7498
Epoch 2: avg free-energy loss = 6.4830
Epoch 3: avg free-energy loss = 5.2918
Epoch 4: avg free-energy loss = 4.6252
Epoch 5: avg free-energy loss = 4.0359
Test Accuracy: 85.85%
{'num_rbm_epochs': 5, 'batch_size': 196, 'rbm_lr': 0.0807587561330031, 'rbm_hidden': 3531, 'num_classifier_epochs': 5, 'lr_C': 0.14287072737019002}
86.9
FrozenTrial(number=41, state=1, values=[86.9], datetime_start=datetime.datetime(2025, 3, 16, 10, 12, 6, 999791), datetime_complete=datetime.datetime(2025, 3, 16, 10, 13, 2, 754404), params={'num_rbm_epochs': 5, 'batch_size': 196, 'rbm_lr': 0.0807587561330031, 'rbm_hidden': 3531, 'num_classifier_epochs': 5, 'lr_C': 0.14287072737019002}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'num_rbm_epochs': IntDistribution(high=5, log=False, low=5, step=1), 'batch_size': IntDistribution(high=1024, log=False, low=192, step=1), 'rbm_lr': FloatDistribution(high=0.1, log=False, low=0.05, step=None), 'rbm_hidden': IntDistribution(high=8192, log=False, low=384, step=1), 'num_classifier_epochs': IntDistribution(high=5, log=False, low=5, step=1), 'lr_C': FloatDistribution(high=10.0, log=True, low=0.01, step=None)}, trial_id=41, value=None)

[I 2025-03-16 09:32:32,166] A new study created in memory with name: no-name-5084138e-6b39-4c7a-bc20-7d83d940819a
[I 2025-03-16 09:33:03,033] Trial 0 finished with value: 86.31 and parameters: {'num_rbm_epochs': 5, 'batch_size': 359, 'rbm_lr': 0.07082212990297065, 'rbm_hidden': 1477, 'num_classifier_epochs': 5, 'lr_C': 8.835713695317322}. Best is trial 0 with value: 86.31.
[I 2025-03-16 09:34:20,236] Trial 1 finished with value: 86.17 and parameters: {'num_rbm_epochs': 5, 'batch_size': 886, 'rbm_lr': 0.06781872334879409, 'rbm_hidden': 6412, 'num_classifier_epochs': 5, 'lr_C': 0.08224481414712612}. Best is trial 0 with value: 86.31.
[I 2025-03-16 09:34:46,714] Trial 2 finished with value: 84.77 and parameters: {'num_rbm_epochs': 5, 'batch_size': 653, 'rbm_lr': 0.06974344884060452, 'rbm_hidden': 1277, 'num_classifier_epochs': 5, 'lr_C': 0.025949735815424455}. Best is trial 0 with value: 86.31.
[I 2025-03-16 09:36:07,854] Trial 3 finished with value: 86.33 and parameters: {'num_rbm_epochs': 5, 'batch_size': 955, 'rbm_lr': 0.09667189914495639, 'rbm_hidden': 6898, 'num_classifier_epochs': 5, 'lr_C': 0.9468540885119474}. Best is trial 3 with value: 86.33.
[I 2025-03-16 09:39:01,602] Trial 4 finished with value: 85.95 and parameters: {'num_rbm_epochs': 5, 'batch_size': 903, 'rbm_lr': 0.06356933394044362, 'rbm_hidden': 5361, 'num_classifier_epochs': 5, 'lr_C': 5.214457750336397}. Best is trial 3 with value: 86.33.
[I 2025-03-16 09:39:33,261] Trial 5 finished with value: 86.42 and parameters: {'num_rbm_epochs': 5, 'batch_size': 977, 'rbm_lr': 0.06281003031008778, 'rbm_hidden': 1862, 'num_classifier_epochs': 5, 'lr_C': 8.053382623397393}. Best is trial 5 with value: 86.42.
[I 2025-03-16 09:40:56,098] Trial 6 finished with value: 86.22 and parameters: {'num_rbm_epochs': 5, 'batch_size': 413, 'rbm_lr': 0.0726269166016703, 'rbm_hidden': 6662, 'num_classifier_epochs': 5, 'lr_C': 1.8923652438091352}. Best is trial 5 with value: 86.42.
[I 2025-03-16 09:42:05,694] Trial 7 finished with value: 86.14 and parameters: {'num_rbm_epochs': 5, 'batch_size': 535, 'rbm_lr': 0.061458172740805866, 'rbm_hidden': 5596, 'num_classifier_epochs': 5, 'lr_C': 1.112689652207355}. Best is trial 5 with value: 86.42.
[I 2025-03-16 09:42:42,416] Trial 8 finished with value: 86.66 and parameters: {'num_rbm_epochs': 5, 'batch_size': 624, 'rbm_lr': 0.08334754480626172, 'rbm_hidden': 2361, 'num_classifier_epochs': 5, 'lr_C': 5.356776285152523}. Best is trial 8 with value: 86.66.
[I 2025-03-16 09:43:05,326] Trial 9 finished with value: 83.87 and parameters: {'num_rbm_epochs': 5, 'batch_size': 904, 'rbm_lr': 0.05141410171128126, 'rbm_hidden': 940, 'num_classifier_epochs': 5, 'lr_C': 0.018660542088673522}. Best is trial 8 with value: 86.66.
[I 2025-03-16 09:45:25,796] Trial 10 finished with value: 86.89 and parameters: {'num_rbm_epochs': 5, 'batch_size': 192, 'rbm_lr': 0.08894235093429392, 'rbm_hidden': 3349, 'num_classifier_epochs': 5, 'lr_C': 0.29116055708796795}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:46:16,016] Trial 11 finished with value: 86.81 and parameters: {'num_rbm_epochs': 5, 'batch_size': 207, 'rbm_lr': 0.08705678135295011, 'rbm_hidden': 3196, 'num_classifier_epochs': 5, 'lr_C': 0.1974280578701699}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:47:09,100] Trial 12 finished with value: 86.78 and parameters: {'num_rbm_epochs': 5, 'batch_size': 214, 'rbm_lr': 0.0854773835116236, 'rbm_hidden': 3519, 'num_classifier_epochs': 5, 'lr_C': 0.1941874065462356}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:48:03,063] Trial 13 finished with value: 86.69 and parameters: {'num_rbm_epochs': 5, 'batch_size': 208, 'rbm_lr': 0.09565175396887042, 'rbm_hidden': 3567, 'num_classifier_epochs': 5, 'lr_C': 0.3077082015380531}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:48:48,799] Trial 14 finished with value: 86.59 and parameters: {'num_rbm_epochs': 5, 'batch_size': 339, 'rbm_lr': 0.08397649547810673, 'rbm_hidden': 3023, 'num_classifier_epochs': 5, 'lr_C': 0.08627420381743349}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:49:49,771] Trial 15 finished with value: 86.39 and parameters: {'num_rbm_epochs': 5, 'batch_size': 495, 'rbm_lr': 0.08924599095145722, 'rbm_hidden': 4642, 'num_classifier_epochs': 5, 'lr_C': 0.10140343535544784}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:50:44,843] Trial 16 finished with value: 86.15 and parameters: {'num_rbm_epochs': 5, 'batch_size': 760, 'rbm_lr': 0.07814346793694502, 'rbm_hidden': 4201, 'num_classifier_epochs': 5, 'lr_C': 0.6228289364475224}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:52:24,303] Trial 17 finished with value: 86.4 and parameters: {'num_rbm_epochs': 5, 'batch_size': 276, 'rbm_lr': 0.09308178057282553, 'rbm_hidden': 7920, 'num_classifier_epochs': 5, 'lr_C': 0.04792887755737209}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:53:03,201] Trial 18 finished with value: 86.52 and parameters: {'num_rbm_epochs': 5, 'batch_size': 444, 'rbm_lr': 0.07747651129524147, 'rbm_hidden': 2479, 'num_classifier_epochs': 5, 'lr_C': 0.22968596789394674}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:54:03,660] Trial 19 finished with value: 86.41 and parameters: {'num_rbm_epochs': 5, 'batch_size': 306, 'rbm_lr': 0.08917695499496754, 'rbm_hidden': 4446, 'num_classifier_epochs': 5, 'lr_C': 0.48801888936814825}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:54:47,183] Trial 20 finished with value: 86.61999999999999 and parameters: {'num_rbm_epochs': 5, 'batch_size': 737, 'rbm_lr': 0.09946909188224887, 'rbm_hidden': 3039, 'num_classifier_epochs': 5, 'lr_C': 1.9861785071452491}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:55:45,482] Trial 21 finished with value: 86.82 and parameters: {'num_rbm_epochs': 5, 'batch_size': 197, 'rbm_lr': 0.08425785099052568, 'rbm_hidden': 3847, 'num_classifier_epochs': 5, 'lr_C': 0.1842737797682127}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:56:42,591] Trial 22 finished with value: 86.77 and parameters: {'num_rbm_epochs': 5, 'batch_size': 205, 'rbm_lr': 0.08166735979078468, 'rbm_hidden': 3788, 'num_classifier_epochs': 5, 'lr_C': 0.1510957524149814}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:57:50,293] Trial 23 finished with value: 86.36 and parameters: {'num_rbm_epochs': 5, 'batch_size': 267, 'rbm_lr': 0.0898483076045887, 'rbm_hidden': 4931, 'num_classifier_epochs': 5, 'lr_C': 0.04719820692838271}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:58:31,631] Trial 24 finished with value: 86.72999999999999 and parameters: {'num_rbm_epochs': 5, 'batch_size': 392, 'rbm_lr': 0.07985226808103989, 'rbm_hidden': 2650, 'num_classifier_epochs': 5, 'lr_C': 0.4263969998257128}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:59:27,599] Trial 25 finished with value: 86.52 and parameters: {'num_rbm_epochs': 5, 'batch_size': 272, 'rbm_lr': 0.0873337802841949, 'rbm_hidden': 3858, 'num_classifier_epochs': 5, 'lr_C': 0.13828784430866584}. Best is trial 10 with value: 86.89.
[I 2025-03-16 09:59:51,691] Trial 26 finished with value: 85.50999999999999 and parameters: {'num_rbm_epochs': 5, 'batch_size': 193, 'rbm_lr': 0.07546741104690456, 'rbm_hidden': 574, 'num_classifier_epochs': 5, 'lr_C': 0.31426338023241335}. Best is trial 10 with value: 86.89.
[I 2025-03-16 10:00:26,057] Trial 27 finished with value: 86.03 and parameters: {'num_rbm_epochs': 5, 'batch_size': 483, 'rbm_lr': 0.09289820752462112, 'rbm_hidden': 2041, 'num_classifier_epochs': 5, 'lr_C': 0.05825183072669943}. Best is trial 10 with value: 86.89.
[I 2025-03-16 10:01:13,679] Trial 28 finished with value: 86.49 and parameters: {'num_rbm_epochs': 5, 'batch_size': 339, 'rbm_lr': 0.09133399903938651, 'rbm_hidden': 3213, 'num_classifier_epochs': 5, 'lr_C': 0.7269628021171956}. Best is trial 10 with value: 86.89.
[I 2025-03-16 10:01:46,217] Trial 29 finished with value: 84.5 and parameters: {'num_rbm_epochs': 5, 'batch_size': 370, 'rbm_lr': 0.08080041824936879, 'rbm_hidden': 1655, 'num_classifier_epochs': 5, 'lr_C': 0.011084983488034527}. Best is trial 10 with value: 86.89.
[I 2025-03-16 10:02:57,246] Trial 30 finished with value: 86.38 and parameters: {'num_rbm_epochs': 5, 'batch_size': 258, 'rbm_lr': 0.08620299540338233, 'rbm_hidden': 5205, 'num_classifier_epochs': 5, 'lr_C': 1.7187367636999638}. Best is trial 10 with value: 86.89.
[I 2025-03-16 10:03:50,010] Trial 31 finished with value: 86.77 and parameters: {'num_rbm_epochs': 5, 'batch_size': 232, 'rbm_lr': 0.08537503633460365, 'rbm_hidden': 3473, 'num_classifier_epochs': 5, 'lr_C': 0.18636977397295879}. Best is trial 10 with value: 86.89.
[I 2025-03-16 10:04:48,383] Trial 32 finished with value: 86.46000000000001 and parameters: {'num_rbm_epochs': 5, 'batch_size': 310, 'rbm_lr': 0.08629290579664917, 'rbm_hidden': 4119, 'num_classifier_epochs': 5, 'lr_C': 0.24102104826922438}. Best is trial 10 with value: 86.89.
[I 2025-03-16 10:06:07,151] Trial 33 finished with value: 86.61 and parameters: {'num_rbm_epochs': 5, 'batch_size': 234, 'rbm_lr': 0.09498818572072028, 'rbm_hidden': 5864, 'num_classifier_epochs': 5, 'lr_C': 0.14950798292950482}. Best is trial 10 with value: 86.89.
[I 2025-03-16 10:06:54,338] Trial 34 finished with value: 86.46000000000001 and parameters: {'num_rbm_epochs': 5, 'batch_size': 192, 'rbm_lr': 0.09968591512935084, 'rbm_hidden': 2803, 'num_classifier_epochs': 5, 'lr_C': 0.10636654764831616}. Best is trial 10 with value: 86.89.
[I 2025-03-16 10:07:43,699] Trial 35 finished with value: 86.42999999999999 and parameters: {'num_rbm_epochs': 5, 'batch_size': 299, 'rbm_lr': 0.07399681270480596, 'rbm_hidden': 3394, 'num_classifier_epochs': 5, 'lr_C': 0.43356088918136787}. Best is trial 10 with value: 86.89.
[I 2025-03-16 10:08:44,168] Trial 36 finished with value: 86.05000000000001 and parameters: {'num_rbm_epochs': 5, 'batch_size': 548, 'rbm_lr': 0.08274332162203847, 'rbm_hidden': 4710, 'num_classifier_epochs': 5, 'lr_C': 0.034517957507472934}. Best is trial 10 with value: 86.89.
[I 2025-03-16 10:09:22,187] Trial 37 finished with value: 86.38 and parameters: {'num_rbm_epochs': 5, 'batch_size': 350, 'rbm_lr': 0.06658356404955273, 'rbm_hidden': 2270, 'num_classifier_epochs': 5, 'lr_C': 0.06721500518327761}. Best is trial 10 with value: 86.89.
[I 2025-03-16 10:10:16,386] Trial 38 finished with value: 86.35000000000001 and parameters: {'num_rbm_epochs': 5, 'batch_size': 700, 'rbm_lr': 0.08840204284266104, 'rbm_hidden': 4102, 'num_classifier_epochs': 5, 'lr_C': 1.1288224387428651}. Best is trial 10 with value: 86.89.
[I 2025-03-16 10:11:36,232] Trial 39 finished with value: 86.39 and parameters: {'num_rbm_epochs': 5, 'batch_size': 240, 'rbm_lr': 0.07794122474750537, 'rbm_hidden': 6012, 'num_classifier_epochs': 5, 'lr_C': 0.2295093483905593}. Best is trial 10 with value: 86.89.
[I 2025-03-16 10:12:06,999] Trial 40 finished with value: 86.16 and parameters: {'num_rbm_epochs': 5, 'batch_size': 413, 'rbm_lr': 0.09121257967490005, 'rbm_hidden': 1494, 'num_classifier_epochs': 5, 'lr_C': 0.36029069747329506}. Best is trial 10 with value: 86.89.
[I 2025-03-16 10:13:02,754] Trial 41 finished with value: 86.9 and parameters: {'num_rbm_epochs': 5, 'batch_size': 196, 'rbm_lr': 0.0807587561330031, 'rbm_hidden': 3531, 'num_classifier_epochs': 5, 'lr_C': 0.14287072737019002}. Best is trial 41 with value: 86.9.
[I 2025-03-16 10:13:59,756] Trial 42 finished with value: 86.72999999999999 and parameters: {'num_rbm_epochs': 5, 'batch_size': 239, 'rbm_lr': 0.08436111642708782, 'rbm_hidden': 3722, 'num_classifier_epochs': 5, 'lr_C': 0.12613294304922887}. Best is trial 41 with value: 86.9.
[I 2025-03-16 10:14:43,787] Trial 43 finished with value: 86.52 and parameters: {'num_rbm_epochs': 5, 'batch_size': 312, 'rbm_lr': 0.08112514115745295, 'rbm_hidden': 2857, 'num_classifier_epochs': 5, 'lr_C': 0.18592114869205434}. Best is trial 41 with value: 86.9.
[I 2025-03-16 10:15:28,164] Trial 44 finished with value: 86.14 and parameters: {'num_rbm_epochs': 5, 'batch_size': 1013, 'rbm_lr': 0.07227971951354141, 'rbm_hidden': 3198, 'num_classifier_epochs': 5, 'lr_C': 0.08100862274118732}. Best is trial 41 with value: 86.9.
[I 2025-03-16 10:16:29,535] Trial 45 finished with value: 86.81 and parameters: {'num_rbm_epochs': 5, 'batch_size': 218, 'rbm_lr': 0.07612024454543397, 'rbm_hidden': 3977, 'num_classifier_epochs': 5, 'lr_C': 0.2669389233662529}. Best is trial 41 with value: 86.9.
[I 2025-03-16 10:17:40,195] Trial 46 finished with value: 86.49 and parameters: {'num_rbm_epochs': 5, 'batch_size': 275, 'rbm_lr': 0.06957818596292636, 'rbm_hidden': 5110, 'num_classifier_epochs': 5, 'lr_C': 0.6330923240954488}. Best is trial 41 with value: 86.9.
[I 2025-03-16 10:18:45,323] Trial 47 finished with value: 86.61999999999999 and parameters: {'num_rbm_epochs': 5, 'batch_size': 195, 'rbm_lr': 0.07645354649494197, 'rbm_hidden': 4376, 'num_classifier_epochs': 5, 'lr_C': 0.30098936993636133}. Best is trial 41 with value: 86.9.
[I 2025-03-16 10:19:38,094] Trial 48 finished with value: 86.25 and parameters: {'num_rbm_epochs': 5, 'batch_size': 826, 'rbm_lr': 0.0662127771226733, 'rbm_hidden': 3946, 'num_classifier_epochs': 5, 'lr_C': 0.27108307199174936}. Best is trial 41 with value: 86.9.
[I 2025-03-16 10:20:16,730] Trial 49 finished with value: 85.85000000000001 and parameters: {'num_rbm_epochs': 5, 'batch_size': 242, 'rbm_lr': 0.051328901823061976, 'rbm_hidden': 1948, 'num_classifier_epochs': 5, 'lr_C': 0.10535807924840984}. Best is trial 41 with value: 86.9.

Test Accuracy of Logistic Regression on RBM Hidden Features by Inverse Regularization Strength

Test Accuracy By Number of RBM Hidden Units

Model 5: Feed Forward Network on RBM Hidden Features (of Fashion MNIST Data)
Click to Show Code and Output
Code
CLASSIFIER = 'FNN'

if CLASSIFIER == 'LogisticRegression':
    experiment = mlflow.set_experiment("pytorch-fmnist-lr-withrbm")
else:
    experiment = mlflow.set_experiment("pytorch-fmnist-fnn-withrbm")


class RBM(nn.Module):
    def __init__(self, n_visible=784, n_hidden=256, k=1):
        super(RBM, self).__init__()
        self.n_visible = n_visible
        self.n_hidden = n_hidden
        # Initialize weights and biases
        self.W = nn.Parameter(torch.randn(n_hidden, n_visible) * 0.1)
        self.v_bias = nn.Parameter(torch.zeros(n_visible))
        self.h_bias = nn.Parameter(torch.zeros(n_hidden))
        self.k = k  # CD-k steps

    def sample_h(self, v):
        # Given visible v, sample hidden h
        p_h = torch.sigmoid(F.linear(v, self.W, self.h_bias))  # p(h=1|v)
        h_sample = torch.bernoulli(p_h)                        # sample Bernoulli
        return p_h, h_sample

    def sample_v(self, h):
        # Given hidden h, sample visible v
        p_v = torch.sigmoid(F.linear(h, self.W.t(), self.v_bias))  # p(v=1|h)
        v_sample = torch.bernoulli(p_v)
        return p_v, v_sample

    def forward(self, v):
        # Perform k steps of contrastive divergence starting from v
        v_k = v.clone()
        for _ in range(self.k):
            _, h_k = self.sample_h(v_k)    # sample hidden from current visible
            _, v_k = self.sample_v(h_k)    # sample visible from hidden
        return v_k  # k-step reconstructed visible

    def free_energy(self, v):
        # Compute the visible bias term for each sample in the batch
        vbias_term = (v * self.v_bias).sum(dim=1)  # shape: [batch_size]
        # Compute the activation of the hidden units
        wx_b = F.linear(v, self.W, self.h_bias)     # shape: [batch_size, n_hidden]
        # Compute the hidden term
        hidden_term = torch.sum(torch.log1p(torch.exp(wx_b)), dim=1)  # shape: [batch_size]
        # Return the mean free energy over the batch
        return - (vbias_term + hidden_term).mean()
    
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.FashionMNIST(root='./data', train=True, transform=transform, download=True)
test_dataset = datasets.FashionMNIST(root='./data', train=False, transform=transform, download=True)

def objective(trial):
    num_rbm_epochs = trial.suggest_int("num_rbm_epochs", 5, 5)# 24, 33)
    batch_size = trial.suggest_int("batch_size", 192, 1024)
    rbm_lr = trial.suggest_float("rbm_lr", 0.05, 0.1)
    rbm_hidden = trial.suggest_int("rbm_hidden", 384, 8192)

    mlflow.start_run(experiment_id=experiment.experiment_id)
    if CLASSIFIER != 'LogisticRegression':
        fnn_hidden = trial.suggest_int("fnn_hidden", 192, 384)
        fnn_lr = trial.suggest_float("fnn_lr", 0.0001, 0.0025)
        mlflow.log_param("fnn_hidden", fnn_hidden)
        mlflow.log_param("fnn_lr", fnn_lr)

    num_classifier_epochs = trial.suggest_int("num_classifier_epochs", 5, 5)# 40, 60)

    mlflow.log_param("num_rbm_epochs", num_rbm_epochs)
    mlflow.log_param("batch_size", batch_size)
    mlflow.log_param("rbm_lr", rbm_lr)
    mlflow.log_param("rbm_hidden", rbm_hidden)
    mlflow.log_param("num_classifier_epochs", num_classifier_epochs)

    # Instantiate RBM and optimizer
    device = torch.device("mps")
    rbm = RBM(n_visible=784, n_hidden=rbm_hidden, k=1).to(device)
    optimizer = torch.optim.SGD(rbm.parameters(), lr=rbm_lr)

    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

    rbm_training_failed = False
    # Training loop (assuming train_loader yields batches of images and labels)
    for epoch in range(num_rbm_epochs):
        total_loss = 0.0
        for images, _ in train_loader:
            # Flatten images and binarize
            v0 = images.view(-1, 784).to(rbm.W.device)      # shape [batch_size, 784]
            v0 = torch.bernoulli(v0)                        # sample binary input
            vk = rbm(v0)                                    # k-step CD reconstruction
            # Compute contrastive divergence loss (free energy difference)
            loss = rbm.free_energy(v0) - rbm.free_energy(vk)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        print(f"Epoch {epoch+1}: avg free-energy loss = {total_loss/len(train_loader):.4f}")
        if np.isnan(total_loss):
            rbm_training_failed = True
            break

    if rbm_training_failed:
        accuracy = 0.0
    else:
        rbm.eval()  # set in evaluation mode if using any layers that behave differently in training
        features_list = []
        labels_list = []
        for images, labels in train_loader:
            v = images.view(-1, 784).to(rbm.W.device)
            v = v  # (optionally binarize or use raw normalized pixels)
            h_prob, h_sample = rbm.sample_h(v)  # get hidden activations
            features_list.append(h_prob.cpu().detach().numpy())
            labels_list.append(labels.numpy())
        train_features = np.concatenate(features_list)  # shape: [N_train, n_hidden]
        train_labels = np.concatenate(labels_list)

        # Convert pre-extracted training features and labels to tensors and create a DataLoader
        train_features_tensor = torch.tensor(train_features, dtype=torch.float32)
        train_labels_tensor = torch.tensor(train_labels, dtype=torch.long)
        train_feature_dataset = torch.utils.data.TensorDataset(train_features_tensor, train_labels_tensor)
        train_feature_loader = torch.utils.data.DataLoader(train_feature_dataset, batch_size=batch_size, shuffle=True)

            
        if CLASSIFIER == 'LogisticRegression':
            # add optuna tuning same as log reg without RBM features...
            lr_C = trial.suggest_float("lr_C", 0.01, 10.0, log=True)  
            mlflow.log_param("lr_C", lr_C)  # Log the chosen C value

            classifier = LogisticRegression(max_iter=num_classifier_epochs, C=lr_C, solver="saga") 
            classifier.fit(train_features, train_labels)            
            
        else:
            classifier = nn.Sequential(
                nn.Linear(rbm.n_hidden, fnn_hidden),
                nn.ReLU(),
                nn.Linear(fnn_hidden, 10)
            )

            # Move classifier to the same device as the RBM
            classifier = classifier.to(device)
            criterion = nn.CrossEntropyLoss()
            classifier_optimizer = torch.optim.Adam(classifier.parameters(), lr=fnn_lr)

            classifier.train()
            for epoch in range(num_classifier_epochs):
                running_loss = 0.0
                for features, labels in train_feature_loader:
                    features = features.to(device)
                    labels = labels.to(device)
                    
                    # Forward pass through classifier
                    outputs = classifier(features)
                    loss = criterion(outputs, labels)
                    
                    # Backpropagation and optimization
                    classifier_optimizer.zero_grad()
                    loss.backward()
                    classifier_optimizer.step()
                    
                    running_loss += loss.item()
                avg_loss = running_loss / len(train_feature_loader)
                print(f"Classifier Epoch {epoch+1}: loss = {avg_loss:.4f}")

        # Evaluate the classifier on test data.
        # Here we extract features from the RBM for each test image.
        if CLASSIFIER != 'LogisticRegression':
            classifier.eval()
            correct = 0
            total = 0
        features_list = []
        labels_list = []
        with torch.no_grad():
            for images, labels in test_loader:
                v = images.view(-1, 784).to(device)
                # Extract hidden activations; you can use either h_prob or h_sample.
                h_prob, _ = rbm.sample_h(v)
                if CLASSIFIER == 'LogisticRegression':
                    features_list.append(h_prob.cpu().detach().numpy())
                    labels_list.append(labels.numpy())
                else:
                    outputs = classifier(h_prob)
                    _, predicted = torch.max(outputs.data, 1)
                    total += labels.size(0)
                    correct += (predicted.cpu() == labels).sum().item()

        if CLASSIFIER == 'LogisticRegression':
            test_features = np.concatenate(features_list)
            test_labels = np.concatenate(labels_list)
            predictions = classifier.predict(test_features)
            accuracy = accuracy_score(test_labels, predictions) * 100
        else:
            accuracy = 100 * correct / total

        print(f"Test Accuracy: {accuracy:.2f}%")

    mlflow.log_metric("test_accuracy", accuracy)
    mlflow.end_run()

    return accuracy

if __name__ == "__main__":
    study = optuna.create_study(direction="maximize")
    study.optimize(objective, n_trials=50)
    print(study.best_params)
    print(study.best_value)
    print(study.best_trial)
Epoch 1: avg free-energy loss = 41.1442
Epoch 2: avg free-energy loss = 13.3773
Epoch 3: avg free-energy loss = 9.8521
Epoch 4: avg free-energy loss = 8.0321
Epoch 5: avg free-energy loss = 6.8746
Classifier Epoch 1: loss = 0.5213
Classifier Epoch 2: loss = 0.3976
Classifier Epoch 3: loss = 0.3749
Classifier Epoch 4: loss = 0.3471
Classifier Epoch 5: loss = 0.3272
Test Accuracy: 85.59%
Epoch 1: avg free-energy loss = 51.2651
Epoch 2: avg free-energy loss = 10.8171
Epoch 3: avg free-energy loss = 6.6248
Epoch 4: avg free-energy loss = 4.1506
Epoch 5: avg free-energy loss = 2.7934
Classifier Epoch 1: loss = 0.7158
Classifier Epoch 2: loss = 0.4763
Classifier Epoch 3: loss = 0.4379
Classifier Epoch 4: loss = 0.4142
Classifier Epoch 5: loss = 0.3923
Test Accuracy: 84.56%
Epoch 1: avg free-energy loss = 52.6578
Epoch 2: avg free-energy loss = 11.7883
Epoch 3: avg free-energy loss = 8.3948
Epoch 4: avg free-energy loss = 6.5664
Epoch 5: avg free-energy loss = 5.4403
Classifier Epoch 1: loss = 0.7110
Classifier Epoch 2: loss = 0.4786
Classifier Epoch 3: loss = 0.4406
Classifier Epoch 4: loss = 0.4189
Classifier Epoch 5: loss = 0.3993
Test Accuracy: 84.24%
Epoch 1: avg free-energy loss = 71.9135
Epoch 2: avg free-energy loss = 22.5843
Epoch 3: avg free-energy loss = 17.8595
Epoch 4: avg free-energy loss = 15.3543
Epoch 5: avg free-energy loss = 13.8862
Classifier Epoch 1: loss = 0.5642
Classifier Epoch 2: loss = 0.4330
Classifier Epoch 3: loss = 0.4014
Classifier Epoch 4: loss = 0.3806
Classifier Epoch 5: loss = 0.3607
Test Accuracy: 86.23%
Epoch 1: avg free-energy loss = 92.1053
Epoch 2: avg free-energy loss = 26.7928
Epoch 3: avg free-energy loss = 21.1794
Epoch 4: avg free-energy loss = 18.4277
Epoch 5: avg free-energy loss = 16.8116
Classifier Epoch 1: loss = 0.6061
Classifier Epoch 2: loss = 0.4474
Classifier Epoch 3: loss = 0.4139
Classifier Epoch 4: loss = 0.3952
Classifier Epoch 5: loss = 0.3797
Test Accuracy: 84.67%
Epoch 1: avg free-energy loss = 39.4780
Epoch 2: avg free-energy loss = 10.1831
Epoch 3: avg free-energy loss = 7.9659
Epoch 4: avg free-energy loss = 6.8108
Epoch 5: avg free-energy loss = 5.9059
Classifier Epoch 1: loss = 0.6965
Classifier Epoch 2: loss = 0.4758
Classifier Epoch 3: loss = 0.4419
Classifier Epoch 4: loss = 0.4219
Classifier Epoch 5: loss = 0.4028
Test Accuracy: 84.87%
Epoch 1: avg free-energy loss = 320.9536
Epoch 2: avg free-energy loss = 94.3052
Epoch 3: avg free-energy loss = 61.1653
Epoch 4: avg free-energy loss = 46.2653
Epoch 5: avg free-energy loss = 37.1030
Classifier Epoch 1: loss = 0.6417
Classifier Epoch 2: loss = 0.4454
Classifier Epoch 3: loss = 0.4059
Classifier Epoch 4: loss = 0.3856
Classifier Epoch 5: loss = 0.3706
Test Accuracy: 85.00%
Epoch 1: avg free-energy loss = 403.7881
Epoch 2: avg free-energy loss = 121.6750
Epoch 3: avg free-energy loss = 80.3368
Epoch 4: avg free-energy loss = 59.9126
Epoch 5: avg free-energy loss = 48.5322
Classifier Epoch 1: loss = 0.6895
Classifier Epoch 2: loss = 0.4594
Classifier Epoch 3: loss = 0.4200
Classifier Epoch 4: loss = 0.4043
Classifier Epoch 5: loss = 0.3834
Test Accuracy: 84.73%
Epoch 1: avg free-energy loss = 122.5597
Epoch 2: avg free-energy loss = 35.1586
Epoch 3: avg free-energy loss = 23.1525
Epoch 4: avg free-energy loss = 18.0908
Epoch 5: avg free-energy loss = 15.2489
Classifier Epoch 1: loss = 0.5723
Classifier Epoch 2: loss = 0.4222
Classifier Epoch 3: loss = 0.3881
Classifier Epoch 4: loss = 0.3728
Classifier Epoch 5: loss = 0.3504
Test Accuracy: 86.31%
Epoch 1: avg free-energy loss = 458.6334
Epoch 2: avg free-energy loss = 141.0498
Epoch 3: avg free-energy loss = 93.7296
Epoch 4: avg free-energy loss = 73.0652
Epoch 5: avg free-energy loss = 57.8178
Classifier Epoch 1: loss = 0.6995
Classifier Epoch 2: loss = 0.4669
Classifier Epoch 3: loss = 0.4267
Classifier Epoch 4: loss = 0.4059
Classifier Epoch 5: loss = 0.3906
Test Accuracy: 84.34%
Epoch 1: avg free-energy loss = 164.1483
Epoch 2: avg free-energy loss = 49.9797
Epoch 3: avg free-energy loss = 32.5928
Epoch 4: avg free-energy loss = 25.8441
Epoch 5: avg free-energy loss = 21.1359
Classifier Epoch 1: loss = 0.5876
Classifier Epoch 2: loss = 0.4144
Classifier Epoch 3: loss = 0.3803
Classifier Epoch 4: loss = 0.3544
Classifier Epoch 5: loss = 0.3461
Test Accuracy: 85.65%
Epoch 1: avg free-energy loss = 95.0740
Epoch 2: avg free-energy loss = 26.3020
Epoch 3: avg free-energy loss = 18.8624
Epoch 4: avg free-energy loss = 15.5844
Epoch 5: avg free-energy loss = 13.6336
Classifier Epoch 1: loss = 0.8127
Classifier Epoch 2: loss = 0.5096
Classifier Epoch 3: loss = 0.4621
Classifier Epoch 4: loss = 0.4379
Classifier Epoch 5: loss = 0.4214
Test Accuracy: 83.64%
Epoch 1: avg free-energy loss = 73.6810
Epoch 2: avg free-energy loss = 21.5836
Epoch 3: avg free-energy loss = 16.4904
Epoch 4: avg free-energy loss = 13.9487
Epoch 5: avg free-energy loss = 12.3566
Classifier Epoch 1: loss = 0.5974
Classifier Epoch 2: loss = 0.4405
Classifier Epoch 3: loss = 0.4066
Classifier Epoch 4: loss = 0.3740
Classifier Epoch 5: loss = 0.3622
Test Accuracy: 85.38%
Epoch 1: avg free-energy loss = 169.3721
Epoch 2: avg free-energy loss = 46.9003
Epoch 3: avg free-energy loss = 30.8309
Epoch 4: avg free-energy loss = 24.8842
Epoch 5: avg free-energy loss = 21.0956
Classifier Epoch 1: loss = 0.6443
Classifier Epoch 2: loss = 0.4552
Classifier Epoch 3: loss = 0.4188
Classifier Epoch 4: loss = 0.3990
Classifier Epoch 5: loss = 0.3846
Test Accuracy: 84.76%
Epoch 1: avg free-energy loss = -10.3921
Epoch 2: avg free-energy loss = -10.1069
Epoch 3: avg free-energy loss = -8.4857
Epoch 4: avg free-energy loss = -7.2676
Epoch 5: avg free-energy loss = -6.2592
Classifier Epoch 1: loss = 0.5580
Classifier Epoch 2: loss = 0.4209
Classifier Epoch 3: loss = 0.3892
Classifier Epoch 4: loss = 0.3672
Classifier Epoch 5: loss = 0.3519
Test Accuracy: 85.51%
Epoch 1: avg free-energy loss = 89.0311
Epoch 2: avg free-energy loss = 23.6226
Epoch 3: avg free-energy loss = 15.7039
Epoch 4: avg free-energy loss = 12.2822
Epoch 5: avg free-energy loss = 10.4378
Classifier Epoch 1: loss = 0.5935
Classifier Epoch 2: loss = 0.4254
Classifier Epoch 3: loss = 0.3960
Classifier Epoch 4: loss = 0.3748
Classifier Epoch 5: loss = 0.3591
Test Accuracy: 86.27%
Epoch 1: avg free-energy loss = 105.0794
Epoch 2: avg free-energy loss = 27.8303
Epoch 3: avg free-energy loss = 18.1141
Epoch 4: avg free-energy loss = 14.2598
Epoch 5: avg free-energy loss = 11.7045
Classifier Epoch 1: loss = 0.5736
Classifier Epoch 2: loss = 0.4233
Classifier Epoch 3: loss = 0.3886
Classifier Epoch 4: loss = 0.3585
Classifier Epoch 5: loss = 0.3408
Test Accuracy: 85.99%
Epoch 1: avg free-energy loss = -9.4493
Epoch 2: avg free-energy loss = -11.3147
Epoch 3: avg free-energy loss = -10.1384
Epoch 4: avg free-energy loss = -9.1680
Epoch 5: avg free-energy loss = -8.1800
Classifier Epoch 1: loss = 0.6607
Classifier Epoch 2: loss = 0.4645
Classifier Epoch 3: loss = 0.4321
Classifier Epoch 4: loss = 0.4088
Classifier Epoch 5: loss = 0.3891
Test Accuracy: 84.13%
Epoch 1: avg free-energy loss = 112.3624
Epoch 2: avg free-energy loss = 31.7360
Epoch 3: avg free-energy loss = 20.3989
Epoch 4: avg free-energy loss = 15.3285
Epoch 5: avg free-energy loss = 12.7928
Classifier Epoch 1: loss = 0.6094
Classifier Epoch 2: loss = 0.4393
Classifier Epoch 3: loss = 0.4008
Classifier Epoch 4: loss = 0.3739
Classifier Epoch 5: loss = 0.3602
Test Accuracy: 85.91%
Epoch 1: avg free-energy loss = 241.7577
Epoch 2: avg free-energy loss = 78.0329
Epoch 3: avg free-energy loss = 52.1741
Epoch 4: avg free-energy loss = 37.2763
Epoch 5: avg free-energy loss = 31.4904
Classifier Epoch 1: loss = 0.6753
Classifier Epoch 2: loss = 0.4577
Classifier Epoch 3: loss = 0.4233
Classifier Epoch 4: loss = 0.3942
Classifier Epoch 5: loss = 0.3824
Test Accuracy: 85.65%
Epoch 1: avg free-energy loss = 80.2170
Epoch 2: avg free-energy loss = 21.9503
Epoch 3: avg free-energy loss = 15.2571
Epoch 4: avg free-energy loss = 11.8656
Epoch 5: avg free-energy loss = 10.0759
Classifier Epoch 1: loss = 0.6182
Classifier Epoch 2: loss = 0.4349
Classifier Epoch 3: loss = 0.4019
Classifier Epoch 4: loss = 0.3788
Classifier Epoch 5: loss = 0.3646
Test Accuracy: 86.09%
Epoch 1: avg free-energy loss = 95.1577
Epoch 2: avg free-energy loss = 26.6906
Epoch 3: avg free-energy loss = 19.8070
Epoch 4: avg free-energy loss = 16.7382
Epoch 5: avg free-energy loss = 14.8592
Classifier Epoch 1: loss = 0.5755
Classifier Epoch 2: loss = 0.4317
Classifier Epoch 3: loss = 0.4003
Classifier Epoch 4: loss = 0.3777
Classifier Epoch 5: loss = 0.3587
Test Accuracy: 86.58%
Epoch 1: avg free-energy loss = 151.1207
Epoch 2: avg free-energy loss = 37.0672
Epoch 3: avg free-energy loss = 26.0086
Epoch 4: avg free-energy loss = 20.5716
Epoch 5: avg free-energy loss = 17.5519
Classifier Epoch 1: loss = 0.5953
Classifier Epoch 2: loss = 0.4474
Classifier Epoch 3: loss = 0.4130
Classifier Epoch 4: loss = 0.3876
Classifier Epoch 5: loss = 0.3727
Test Accuracy: 86.07%
Epoch 1: avg free-energy loss = 125.3522
Epoch 2: avg free-energy loss = 36.8164
Epoch 3: avg free-energy loss = 25.2092
Epoch 4: avg free-energy loss = 20.3698
Epoch 5: avg free-energy loss = 17.6231
Classifier Epoch 1: loss = 0.5703
Classifier Epoch 2: loss = 0.4260
Classifier Epoch 3: loss = 0.3856
Classifier Epoch 4: loss = 0.3651
Classifier Epoch 5: loss = 0.3507
Test Accuracy: 86.00%
Epoch 1: avg free-energy loss = 114.6826
Epoch 2: avg free-energy loss = 53.7552
Epoch 3: avg free-energy loss = 26.0180
Epoch 4: avg free-energy loss = 22.9473
Epoch 5: avg free-energy loss = 21.3614
Classifier Epoch 1: loss = 0.6536
Classifier Epoch 2: loss = 0.4553
Classifier Epoch 3: loss = 0.4137
Classifier Epoch 4: loss = 0.4054
Classifier Epoch 5: loss = 0.3769
Test Accuracy: 82.07%
Epoch 1: avg free-energy loss = 48.3332
Epoch 2: avg free-energy loss = 14.7865
Epoch 3: avg free-energy loss = 11.0317
Epoch 4: avg free-energy loss = 9.3471
Epoch 5: avg free-energy loss = 8.3662
Classifier Epoch 1: loss = 0.5790
Classifier Epoch 2: loss = 0.4320
Classifier Epoch 3: loss = 0.4038
Classifier Epoch 4: loss = 0.3767
Classifier Epoch 5: loss = 0.3553
Test Accuracy: 86.19%
Epoch 1: avg free-energy loss = 17.3310
Epoch 2: avg free-energy loss = 1.5393
Epoch 3: avg free-energy loss = 0.1846
Epoch 4: avg free-energy loss = -0.2184
Epoch 5: avg free-energy loss = -0.5774
Classifier Epoch 1: loss = 0.6372
Classifier Epoch 2: loss = 0.4535
Classifier Epoch 3: loss = 0.4209
Classifier Epoch 4: loss = 0.3994
Classifier Epoch 5: loss = 0.3779
Test Accuracy: 84.84%
Epoch 1: avg free-energy loss = 140.1578
Epoch 2: avg free-energy loss = 37.5575
Epoch 3: avg free-energy loss = 24.7901
Epoch 4: avg free-energy loss = 19.1708
Epoch 5: avg free-energy loss = 15.9026
Classifier Epoch 1: loss = 0.6525
Classifier Epoch 2: loss = 0.4436
Classifier Epoch 3: loss = 0.4111
Classifier Epoch 4: loss = 0.3894
Classifier Epoch 5: loss = 0.3749
Test Accuracy: 84.82%
Epoch 1: avg free-energy loss = 145.5177
Epoch 2: avg free-energy loss = 43.2769
Epoch 3: avg free-energy loss = 30.1405
Epoch 4: avg free-energy loss = 24.0759
Epoch 5: avg free-energy loss = 20.5072
Classifier Epoch 1: loss = 0.5314
Classifier Epoch 2: loss = 0.4079
Classifier Epoch 3: loss = 0.3772
Classifier Epoch 4: loss = 0.3566
Classifier Epoch 5: loss = 0.3346
Test Accuracy: 86.54%
Epoch 1: avg free-energy loss = 139.6170
Epoch 2: avg free-energy loss = 41.2337
Epoch 3: avg free-energy loss = 28.9471
Epoch 4: avg free-energy loss = 23.2911
Epoch 5: avg free-energy loss = 19.7793
Classifier Epoch 1: loss = 0.5359
Classifier Epoch 2: loss = 0.4040
Classifier Epoch 3: loss = 0.3739
Classifier Epoch 4: loss = 0.3533
Classifier Epoch 5: loss = 0.3370
Test Accuracy: 85.48%
Epoch 1: avg free-energy loss = 155.9953
Epoch 2: avg free-energy loss = 44.8035
Epoch 3: avg free-energy loss = 31.5432
Epoch 4: avg free-energy loss = 25.7398
Epoch 5: avg free-energy loss = 22.2983
Classifier Epoch 1: loss = 0.5411
Classifier Epoch 2: loss = 0.4138
Classifier Epoch 3: loss = 0.3856
Classifier Epoch 4: loss = 0.3583
Classifier Epoch 5: loss = 0.3411
Test Accuracy: 86.32%
Epoch 1: avg free-energy loss = 153.2852
Epoch 2: avg free-energy loss = 45.7144
Epoch 3: avg free-energy loss = 32.1552
Epoch 4: avg free-energy loss = 26.3964
Epoch 5: avg free-energy loss = 23.1470
Classifier Epoch 1: loss = 0.5518
Classifier Epoch 2: loss = 0.4154
Classifier Epoch 3: loss = 0.3849
Classifier Epoch 4: loss = 0.3604
Classifier Epoch 5: loss = 0.3449
Test Accuracy: 85.75%
Epoch 1: avg free-energy loss = 144.0413
Epoch 2: avg free-energy loss = 37.8745
Epoch 3: avg free-energy loss = 28.1547
Epoch 4: avg free-energy loss = 23.8362
Epoch 5: avg free-energy loss = 21.1103
Classifier Epoch 1: loss = 0.5451
Classifier Epoch 2: loss = 0.4218
Classifier Epoch 3: loss = 0.3893
Classifier Epoch 4: loss = 0.3644
Classifier Epoch 5: loss = 0.3597
Test Accuracy: 85.55%
Epoch 1: avg free-energy loss = 122.0357
Epoch 2: avg free-energy loss = 32.5872
Epoch 3: avg free-energy loss = 23.8392
Epoch 4: avg free-energy loss = 20.4337
Epoch 5: avg free-energy loss = 18.0269
Classifier Epoch 1: loss = 0.6001
Classifier Epoch 2: loss = 0.4407
Classifier Epoch 3: loss = 0.4051
Classifier Epoch 4: loss = 0.3717
Classifier Epoch 5: loss = 0.3566
Test Accuracy: 85.01%
Epoch 1: avg free-energy loss = 146.0580
Epoch 2: avg free-energy loss = 43.7938
Epoch 3: avg free-energy loss = 31.2302
Epoch 4: avg free-energy loss = 24.6140
Epoch 5: avg free-energy loss = 21.3218
Classifier Epoch 1: loss = 0.5395
Classifier Epoch 2: loss = 0.4114
Classifier Epoch 3: loss = 0.3776
Classifier Epoch 4: loss = 0.3592
Classifier Epoch 5: loss = 0.3371
Test Accuracy: 86.11%
Epoch 1: avg free-energy loss = 203.7523
Epoch 2: avg free-energy loss = 56.8096
Epoch 3: avg free-energy loss = 38.0626
Epoch 4: avg free-energy loss = 30.4193
Epoch 5: avg free-energy loss = 24.9903
Classifier Epoch 1: loss = 0.5848
Classifier Epoch 2: loss = 0.4187
Classifier Epoch 3: loss = 0.3854
Classifier Epoch 4: loss = 0.3586
Classifier Epoch 5: loss = 0.3502
Test Accuracy: 86.05%
Epoch 1: avg free-energy loss = 57.5091
Epoch 2: avg free-energy loss = 20.7694
Epoch 3: avg free-energy loss = 16.6623
Epoch 4: avg free-energy loss = 14.8152
Epoch 5: avg free-energy loss = 13.4973
Classifier Epoch 1: loss = 0.5423
Classifier Epoch 2: loss = 0.4187
Classifier Epoch 3: loss = 0.3846
Classifier Epoch 4: loss = 0.3643
Classifier Epoch 5: loss = 0.3521
Test Accuracy: 85.45%
Epoch 1: avg free-energy loss = 143.3145
Epoch 2: avg free-energy loss = 33.9704
Epoch 3: avg free-energy loss = 23.6725
Epoch 4: avg free-energy loss = 19.3836
Epoch 5: avg free-energy loss = 16.5939
Classifier Epoch 1: loss = 0.6145
Classifier Epoch 2: loss = 0.4406
Classifier Epoch 3: loss = 0.4070
Classifier Epoch 4: loss = 0.3769
Classifier Epoch 5: loss = 0.3651
Test Accuracy: 85.65%
Epoch 1: avg free-energy loss = 256.5661
Epoch 2: avg free-energy loss = 59.7481
Epoch 3: avg free-energy loss = 35.7942
Epoch 4: avg free-energy loss = 25.1669
Epoch 5: avg free-energy loss = 20.8809
Classifier Epoch 1: loss = 0.7320
Classifier Epoch 2: loss = 0.4937
Classifier Epoch 3: loss = 0.4554
Classifier Epoch 4: loss = 0.4339
Classifier Epoch 5: loss = 0.4185
Test Accuracy: 84.15%
Epoch 1: avg free-energy loss = 122.4898
Epoch 2: avg free-energy loss = 34.5530
Epoch 3: avg free-energy loss = 24.5251
Epoch 4: avg free-energy loss = 20.6756
Epoch 5: avg free-energy loss = 18.4158
Classifier Epoch 1: loss = 0.5601
Classifier Epoch 2: loss = 0.4250
Classifier Epoch 3: loss = 0.3860
Classifier Epoch 4: loss = 0.3633
Classifier Epoch 5: loss = 0.3520
Test Accuracy: 86.29%
Epoch 1: avg free-energy loss = 126.8193
Epoch 2: avg free-energy loss = 37.0817
Epoch 3: avg free-energy loss = 25.9812
Epoch 4: avg free-energy loss = 20.9937
Epoch 5: avg free-energy loss = 17.9932
Classifier Epoch 1: loss = 0.5466
Classifier Epoch 2: loss = 0.4123
Classifier Epoch 3: loss = 0.3767
Classifier Epoch 4: loss = 0.3582
Classifier Epoch 5: loss = 0.3445
Test Accuracy: 85.55%
Epoch 1: avg free-energy loss = 138.7841
Epoch 2: avg free-energy loss = 39.0252
Epoch 3: avg free-energy loss = 26.9802
Epoch 4: avg free-energy loss = 21.7643
Epoch 5: avg free-energy loss = 18.6815
Classifier Epoch 1: loss = 0.5636
Classifier Epoch 2: loss = 0.4298
Classifier Epoch 3: loss = 0.3867
Classifier Epoch 4: loss = 0.3638
Classifier Epoch 5: loss = 0.3481
Test Accuracy: 86.50%
Epoch 1: avg free-energy loss = 154.5159
Epoch 2: avg free-energy loss = 43.7947
Epoch 3: avg free-energy loss = 31.7493
Epoch 4: avg free-energy loss = 26.3149
Epoch 5: avg free-energy loss = 22.8085
Classifier Epoch 1: loss = 0.5366
Classifier Epoch 2: loss = 0.4173
Classifier Epoch 3: loss = 0.3788
Classifier Epoch 4: loss = 0.3599
Classifier Epoch 5: loss = 0.3422
Test Accuracy: 86.66%
Epoch 1: avg free-energy loss = 146.4180
Epoch 2: avg free-energy loss = 41.2446
Epoch 3: avg free-energy loss = 29.6270
Epoch 4: avg free-energy loss = 24.5179
Epoch 5: avg free-energy loss = 21.2968
Classifier Epoch 1: loss = 0.5587
Classifier Epoch 2: loss = 0.4224
Classifier Epoch 3: loss = 0.3826
Classifier Epoch 4: loss = 0.3593
Classifier Epoch 5: loss = 0.3474
Test Accuracy: 86.22%
Epoch 1: avg free-energy loss = 153.0604
Epoch 2: avg free-energy loss = 38.7216
Epoch 3: avg free-energy loss = 28.1554
Epoch 4: avg free-energy loss = 23.6256
Epoch 5: avg free-energy loss = 21.2592
Classifier Epoch 1: loss = 0.5506
Classifier Epoch 2: loss = 0.4208
Classifier Epoch 3: loss = 0.3897
Classifier Epoch 4: loss = 0.3720
Classifier Epoch 5: loss = 0.3518
Test Accuracy: 84.84%
Epoch 1: avg free-energy loss = 166.1443
Epoch 2: avg free-energy loss = 46.4790
Epoch 3: avg free-energy loss = 32.0942
Epoch 4: avg free-energy loss = 26.2845
Epoch 5: avg free-energy loss = 22.5830
Classifier Epoch 1: loss = 0.5713
Classifier Epoch 2: loss = 0.4153
Classifier Epoch 3: loss = 0.3843
Classifier Epoch 4: loss = 0.3637
Classifier Epoch 5: loss = 0.3435
Test Accuracy: 86.77%
Epoch 1: avg free-energy loss = 217.4522
Epoch 2: avg free-energy loss = 56.0750
Epoch 3: avg free-energy loss = 37.5405
Epoch 4: avg free-energy loss = 28.9540
Epoch 5: avg free-energy loss = 26.0145
Classifier Epoch 1: loss = 0.6269
Classifier Epoch 2: loss = 0.4323
Classifier Epoch 3: loss = 0.3935
Classifier Epoch 4: loss = 0.3625
Classifier Epoch 5: loss = 0.3469
Test Accuracy: 86.54%
Epoch 1: avg free-energy loss = 208.8774
Epoch 2: avg free-energy loss = 54.9503
Epoch 3: avg free-energy loss = 38.0580
Epoch 4: avg free-energy loss = 29.2487
Epoch 5: avg free-energy loss = 24.2918
Classifier Epoch 1: loss = 0.6660
Classifier Epoch 2: loss = 0.4332
Classifier Epoch 3: loss = 0.4066
Classifier Epoch 4: loss = 0.3764
Classifier Epoch 5: loss = 0.3647
Test Accuracy: 84.84%
Epoch 1: avg free-energy loss = 204.2048
Epoch 2: avg free-energy loss = 57.3754
Epoch 3: avg free-energy loss = 39.3723
Epoch 4: avg free-energy loss = 32.0937
Epoch 5: avg free-energy loss = 27.4903
Classifier Epoch 1: loss = 0.5816
Classifier Epoch 2: loss = 0.4200
Classifier Epoch 3: loss = 0.3822
Classifier Epoch 4: loss = 0.3646
Classifier Epoch 5: loss = 0.3438
Test Accuracy: 83.88%
Epoch 1: avg free-energy loss = 267.3551
Epoch 2: avg free-energy loss = 65.9227
Epoch 3: avg free-energy loss = 42.0068
Epoch 4: avg free-energy loss = 33.0611
Epoch 5: avg free-energy loss = 26.3194
Classifier Epoch 1: loss = 0.7123
Classifier Epoch 2: loss = 0.4635
Classifier Epoch 3: loss = 0.4277
Classifier Epoch 4: loss = 0.3987
Classifier Epoch 5: loss = 0.3845
Test Accuracy: 84.89%
{'num_rbm_epochs': 5, 'batch_size': 244, 'rbm_lr': 0.07049262688811203, 'rbm_hidden': 7387, 'fnn_hidden': 245, 'fnn_lr': 0.0018524990979230458, 'num_classifier_epochs': 5}
86.77
FrozenTrial(number=45, state=1, values=[86.77], datetime_start=datetime.datetime(2025, 3, 16, 10, 47, 30, 685079), datetime_complete=datetime.datetime(2025, 3, 16, 10, 48, 19, 119008), params={'num_rbm_epochs': 5, 'batch_size': 244, 'rbm_lr': 0.07049262688811203, 'rbm_hidden': 7387, 'fnn_hidden': 245, 'fnn_lr': 0.0018524990979230458, 'num_classifier_epochs': 5}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'num_rbm_epochs': IntDistribution(high=5, log=False, low=5, step=1), 'batch_size': IntDistribution(high=1024, log=False, low=192, step=1), 'rbm_lr': FloatDistribution(high=0.1, log=False, low=0.05, step=None), 'rbm_hidden': IntDistribution(high=8192, log=False, low=384, step=1), 'fnn_hidden': IntDistribution(high=384, log=False, low=192, step=1), 'fnn_lr': FloatDistribution(high=0.0025, log=False, low=0.0001, step=None), 'num_classifier_epochs': IntDistribution(high=5, log=False, low=5, step=1)}, trial_id=45, value=None)

[I 2025-03-16 10:20:17,231] A new study created in memory with name: no-name-3fd60331-c125-44cc-a73c-619d366ba06b
[I 2025-03-16 10:20:49,479] Trial 0 finished with value: 85.59 and parameters: {'num_rbm_epochs': 5, 'batch_size': 198, 'rbm_lr': 0.08436361841264595, 'rbm_hidden': 2604, 'fnn_hidden': 306, 'fnn_lr': 0.0010838070161156045, 'num_classifier_epochs': 5}. Best is trial 0 with value: 85.59.
[I 2025-03-16 10:21:10,089] Trial 1 finished with value: 84.56 and parameters: {'num_rbm_epochs': 5, 'batch_size': 991, 'rbm_lr': 0.09517929706969339, 'rbm_hidden': 1810, 'fnn_hidden': 296, 'fnn_lr': 0.0015792763243647335, 'num_classifier_epochs': 5}. Best is trial 0 with value: 85.59.
[I 2025-03-16 10:21:32,289] Trial 2 finished with value: 84.24 and parameters: {'num_rbm_epochs': 5, 'batch_size': 748, 'rbm_lr': 0.06911001243709392, 'rbm_hidden': 2118, 'fnn_hidden': 323, 'fnn_lr': 0.0007756834698615648, 'num_classifier_epochs': 5}. Best is trial 0 with value: 85.59.
[I 2025-03-16 10:22:09,369] Trial 3 finished with value: 86.23 and parameters: {'num_rbm_epochs': 5, 'batch_size': 263, 'rbm_lr': 0.058190166539460526, 'rbm_hidden': 4239, 'fnn_hidden': 281, 'fnn_lr': 0.0007267706569254254, 'num_classifier_epochs': 5}. Best is trial 3 with value: 86.23.
[I 2025-03-16 10:23:00,353] Trial 4 finished with value: 84.67 and parameters: {'num_rbm_epochs': 5, 'batch_size': 195, 'rbm_lr': 0.0565270769168038, 'rbm_hidden': 6573, 'fnn_hidden': 316, 'fnn_lr': 0.0002515160235790303, 'num_classifier_epochs': 5}. Best is trial 3 with value: 86.23.
[I 2025-03-16 10:23:23,385] Trial 5 finished with value: 84.87 and parameters: {'num_rbm_epochs': 5, 'batch_size': 583, 'rbm_lr': 0.0569694283986999, 'rbm_hidden': 2234, 'fnn_hidden': 218, 'fnn_lr': 0.0007239785815181317, 'num_classifier_epochs': 5}. Best is trial 3 with value: 86.23.
[I 2025-03-16 10:23:58,830] Trial 6 finished with value: 85.0 and parameters: {'num_rbm_epochs': 5, 'batch_size': 660, 'rbm_lr': 0.09784453071169795, 'rbm_hidden': 6315, 'fnn_hidden': 196, 'fnn_lr': 0.0013277043439209935, 'num_classifier_epochs': 5}. Best is trial 3 with value: 86.23.
[I 2025-03-16 10:24:38,929] Trial 7 finished with value: 84.73 and parameters: {'num_rbm_epochs': 5, 'batch_size': 712, 'rbm_lr': 0.08371648875951856, 'rbm_hidden': 8022, 'fnn_hidden': 205, 'fnn_lr': 0.0010409495649736215, 'num_classifier_epochs': 5}. Best is trial 3 with value: 86.23.
[I 2025-03-16 10:25:12,672] Trial 8 finished with value: 86.31 and parameters: {'num_rbm_epochs': 5, 'batch_size': 354, 'rbm_lr': 0.09386436120135674, 'rbm_hidden': 4225, 'fnn_hidden': 332, 'fnn_lr': 0.0008725880811642252, 'num_classifier_epochs': 5}. Best is trial 8 with value: 86.31.
[I 2025-03-16 10:25:49,481] Trial 9 finished with value: 84.34 and parameters: {'num_rbm_epochs': 5, 'batch_size': 903, 'rbm_lr': 0.09747372730224671, 'rbm_hidden': 7322, 'fnn_hidden': 379, 'fnn_lr': 0.0006322895096422124, 'num_classifier_epochs': 5}. Best is trial 8 with value: 86.31.
[I 2025-03-16 10:26:23,823] Trial 10 finished with value: 85.65 and parameters: {'num_rbm_epochs': 5, 'batch_size': 416, 'rbm_lr': 0.08435961483147174, 'rbm_hidden': 5015, 'fnn_hidden': 372, 'fnn_lr': 0.0023431164249826976, 'num_classifier_epochs': 5}. Best is trial 8 with value: 86.31.
[I 2025-03-16 10:26:55,652] Trial 11 finished with value: 83.64 and parameters: {'num_rbm_epochs': 5, 'batch_size': 381, 'rbm_lr': 0.06780291233740607, 'rbm_hidden': 3975, 'fnn_hidden': 251, 'fnn_lr': 0.00016446063288698344, 'num_classifier_epochs': 5}. Best is trial 8 with value: 86.31.
[I 2025-03-16 10:27:27,860] Trial 12 finished with value: 85.38 and parameters: {'num_rbm_epochs': 5, 'batch_size': 375, 'rbm_lr': 0.05093322085424206, 'rbm_hidden': 4147, 'fnn_hidden': 268, 'fnn_lr': 0.0016396985845204568, 'num_classifier_epochs': 5}. Best is trial 8 with value: 86.31.
[I 2025-03-16 10:28:01,511] Trial 13 finished with value: 84.76 and parameters: {'num_rbm_epochs': 5, 'batch_size': 507, 'rbm_lr': 0.07077828351725873, 'rbm_hidden': 5262, 'fnn_hidden': 345, 'fnn_lr': 0.0004785954470162955, 'num_classifier_epochs': 5}. Best is trial 8 with value: 86.31.
[I 2025-03-16 10:28:24,865] Trial 14 finished with value: 85.51 and parameters: {'num_rbm_epochs': 5, 'batch_size': 281, 'rbm_lr': 0.07775633712632858, 'rbm_hidden': 795, 'fnn_hidden': 268, 'fnn_lr': 0.0021710908045704095, 'num_classifier_epochs': 5}. Best is trial 8 with value: 86.31.
[I 2025-03-16 10:28:52,409] Trial 15 finished with value: 86.27 and parameters: {'num_rbm_epochs': 5, 'batch_size': 482, 'rbm_lr': 0.0895957246599477, 'rbm_hidden': 3085, 'fnn_hidden': 344, 'fnn_lr': 0.0010268309089367538, 'num_classifier_epochs': 5}. Best is trial 8 with value: 86.31.
[I 2025-03-16 10:29:20,142] Trial 16 finished with value: 85.99 and parameters: {'num_rbm_epochs': 5, 'batch_size': 510, 'rbm_lr': 0.09002179358375789, 'rbm_hidden': 3294, 'fnn_hidden': 345, 'fnn_lr': 0.001811044928710925, 'num_classifier_epochs': 5}. Best is trial 8 with value: 86.31.
[I 2025-03-16 10:29:41,024] Trial 17 finished with value: 84.13 and parameters: {'num_rbm_epochs': 5, 'batch_size': 478, 'rbm_lr': 0.08961842166824682, 'rbm_hidden': 813, 'fnn_hidden': 345, 'fnn_lr': 0.0011408148875530226, 'num_classifier_epochs': 5}. Best is trial 8 with value: 86.31.
[I 2025-03-16 10:30:08,653] Trial 18 finished with value: 85.91 and parameters: {'num_rbm_epochs': 5, 'batch_size': 580, 'rbm_lr': 0.09166284032750835, 'rbm_hidden': 3297, 'fnn_hidden': 360, 'fnn_lr': 0.0014032099604874364, 'num_classifier_epochs': 5}. Best is trial 8 with value: 86.31.
[I 2025-03-16 10:30:39,262] Trial 19 finished with value: 85.65 and parameters: {'num_rbm_epochs': 5, 'batch_size': 813, 'rbm_lr': 0.07770916211142298, 'rbm_hidden': 5240, 'fnn_hidden': 328, 'fnn_lr': 0.0009404346428016813, 'num_classifier_epochs': 5}. Best is trial 8 with value: 86.31.
[I 2025-03-16 10:31:10,839] Trial 20 finished with value: 86.09 and parameters: {'num_rbm_epochs': 5, 'batch_size': 307, 'rbm_lr': 0.09276556866747745, 'rbm_hidden': 3357, 'fnn_hidden': 361, 'fnn_lr': 0.0004192078330751662, 'num_classifier_epochs': 5}. Best is trial 8 with value: 86.31.
[I 2025-03-16 10:31:45,511] Trial 21 finished with value: 86.58 and parameters: {'num_rbm_epochs': 5, 'batch_size': 300, 'rbm_lr': 0.0630845145868285, 'rbm_hidden': 4615, 'fnn_hidden': 283, 'fnn_lr': 0.0008912603968912184, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:32:23,039] Trial 22 finished with value: 86.07 and parameters: {'num_rbm_epochs': 5, 'batch_size': 421, 'rbm_lr': 0.06251654781962593, 'rbm_hidden': 5929, 'fnn_hidden': 245, 'fnn_lr': 0.000895944266287479, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:32:58,420] Trial 23 finished with value: 86.0 and parameters: {'num_rbm_epochs': 5, 'batch_size': 358, 'rbm_lr': 0.07427774583445655, 'rbm_hidden': 4726, 'fnn_hidden': 293, 'fnn_lr': 0.0012703888693112663, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:33:29,227] Trial 24 finished with value: 82.07 and parameters: {'num_rbm_epochs': 5, 'batch_size': 458, 'rbm_lr': 0.08614515411514082, 'rbm_hidden': 3727, 'fnn_hidden': 331, 'fnn_lr': 0.0005636603373389042, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:33:57,719] Trial 25 finished with value: 86.19 and parameters: {'num_rbm_epochs': 5, 'batch_size': 317, 'rbm_lr': 0.06426045482874108, 'rbm_hidden': 2756, 'fnn_hidden': 309, 'fnn_lr': 0.0008644664525628772, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:34:20,258] Trial 26 finished with value: 84.84 and parameters: {'num_rbm_epochs': 5, 'batch_size': 528, 'rbm_lr': 0.07852501111686691, 'rbm_hidden': 1490, 'fnn_hidden': 280, 'fnn_lr': 0.0011880189983864418, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:34:57,279] Trial 27 finished with value: 84.82 and parameters: {'num_rbm_epochs': 5, 'batch_size': 331, 'rbm_lr': 0.09951089419898172, 'rbm_hidden': 4612, 'fnn_hidden': 334, 'fnn_lr': 0.0003662483925512541, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:35:41,208] Trial 28 finished with value: 86.54 and parameters: {'num_rbm_epochs': 5, 'batch_size': 246, 'rbm_lr': 0.0806928992542019, 'rbm_hidden': 5818, 'fnn_hidden': 239, 'fnn_lr': 0.001421823379732054, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:36:22,697] Trial 29 finished with value: 85.48 and parameters: {'num_rbm_epochs': 5, 'batch_size': 240, 'rbm_lr': 0.08191693085219776, 'rbm_hidden': 5617, 'fnn_hidden': 233, 'fnn_lr': 0.0019379554904379302, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:37:12,847] Trial 30 finished with value: 86.32 and parameters: {'num_rbm_epochs': 5, 'batch_size': 232, 'rbm_lr': 0.0725829489456351, 'rbm_hidden': 6926, 'fnn_hidden': 258, 'fnn_lr': 0.0015406539690326852, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:38:03,831] Trial 31 finished with value: 85.75 and parameters: {'num_rbm_epochs': 5, 'batch_size': 231, 'rbm_lr': 0.07193216348130695, 'rbm_hidden': 6994, 'fnn_hidden': 257, 'fnn_lr': 0.0015120602262159544, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:38:57,382] Trial 32 finished with value: 85.55 and parameters: {'num_rbm_epochs': 5, 'batch_size': 209, 'rbm_lr': 0.06591528376492337, 'rbm_hidden': 7650, 'fnn_hidden': 233, 'fnn_lr': 0.0017625116122405467, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:39:41,544] Trial 33 finished with value: 85.01 and parameters: {'num_rbm_epochs': 5, 'batch_size': 279, 'rbm_lr': 0.061660943458468176, 'rbm_hidden': 6305, 'fnn_hidden': 304, 'fnn_lr': 0.0015334078426596813, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:40:24,190] Trial 34 finished with value: 86.11 and parameters: {'num_rbm_epochs': 5, 'batch_size': 265, 'rbm_lr': 0.08150999528526993, 'rbm_hidden': 5708, 'fnn_hidden': 270, 'fnn_lr': 0.0014069792125121856, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:41:08,016] Trial 35 finished with value: 86.05 and parameters: {'num_rbm_epochs': 5, 'batch_size': 342, 'rbm_lr': 0.07499363389043588, 'rbm_hidden': 6954, 'fnn_hidden': 236, 'fnn_lr': 0.002065296507974171, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:41:48,220] Trial 36 finished with value: 85.45 and parameters: {'num_rbm_epochs': 5, 'batch_size': 197, 'rbm_lr': 0.05099482278381415, 'rbm_hidden': 4463, 'fnn_hidden': 218, 'fnn_lr': 0.0011903289979520082, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:42:27,891] Trial 37 finished with value: 85.65 and parameters: {'num_rbm_epochs': 5, 'batch_size': 409, 'rbm_lr': 0.0598297576776619, 'rbm_hidden': 6119, 'fnn_hidden': 285, 'fnn_lr': 0.0017097004725772566, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:43:01,412] Trial 38 finished with value: 84.15 and parameters: {'num_rbm_epochs': 5, 'batch_size': 1003, 'rbm_lr': 0.05501072657712741, 'rbm_hidden': 6582, 'fnn_hidden': 297, 'fnn_lr': 0.0007606774422401882, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:43:39,369] Trial 39 finished with value: 86.29 and parameters: {'num_rbm_epochs': 5, 'batch_size': 298, 'rbm_lr': 0.06851065965858913, 'rbm_hidden': 5463, 'fnn_hidden': 255, 'fnn_lr': 0.001314444709107097, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:44:18,404] Trial 40 finished with value: 85.55 and parameters: {'num_rbm_epochs': 5, 'batch_size': 266, 'rbm_lr': 0.08653710281047691, 'rbm_hidden': 4903, 'fnn_hidden': 222, 'fnn_lr': 0.001018827542451288, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:44:55,710] Trial 41 finished with value: 86.5 and parameters: {'num_rbm_epochs': 5, 'batch_size': 320, 'rbm_lr': 0.0710162975322479, 'rbm_hidden': 5465, 'fnn_hidden': 258, 'fnn_lr': 0.0013374761899341667, 'num_classifier_epochs': 5}. Best is trial 21 with value: 86.58.
[I 2025-03-16 10:45:44,207] Trial 42 finished with value: 86.66 and parameters: {'num_rbm_epochs': 5, 'batch_size': 238, 'rbm_lr': 0.0724841516298329, 'rbm_hidden': 6840, 'fnn_hidden': 276, 'fnn_lr': 0.0013355808077534668, 'num_classifier_epochs': 5}. Best is trial 42 with value: 86.66.
[I 2025-03-16 10:46:33,191] Trial 43 finished with value: 86.22 and parameters: {'num_rbm_epochs': 5, 'batch_size': 227, 'rbm_lr': 0.07235856124118478, 'rbm_hidden': 6713, 'fnn_hidden': 273, 'fnn_lr': 0.001415057840328747, 'num_classifier_epochs': 5}. Best is trial 42 with value: 86.66.
[I 2025-03-16 10:47:30,684] Trial 44 finished with value: 84.84 and parameters: {'num_rbm_epochs': 5, 'batch_size': 201, 'rbm_lr': 0.06663060375579816, 'rbm_hidden': 8168, 'fnn_hidden': 259, 'fnn_lr': 0.0016068131480314921, 'num_classifier_epochs': 5}. Best is trial 42 with value: 86.66.
[I 2025-03-16 10:48:19,119] Trial 45 finished with value: 86.77 and parameters: {'num_rbm_epochs': 5, 'batch_size': 244, 'rbm_lr': 0.07049262688811203, 'rbm_hidden': 7387, 'fnn_hidden': 245, 'fnn_lr': 0.0018524990979230458, 'num_classifier_epochs': 5}. Best is trial 45 with value: 86.77.
[I 2025-03-16 10:49:03,522] Trial 46 finished with value: 86.54 and parameters: {'num_rbm_epochs': 5, 'batch_size': 380, 'rbm_lr': 0.0704344682782717, 'rbm_hidden': 7530, 'fnn_hidden': 242, 'fnn_lr': 0.002498329803237352, 'num_classifier_epochs': 5}. Best is trial 45 with value: 86.77.
[I 2025-03-16 10:49:46,358] Trial 47 finished with value: 84.84 and parameters: {'num_rbm_epochs': 5, 'batch_size': 379, 'rbm_lr': 0.06944476347888391, 'rbm_hidden': 7438, 'fnn_hidden': 244, 'fnn_lr': 0.002142868465968344, 'num_classifier_epochs': 5}. Best is trial 45 with value: 86.77.
[I 2025-03-16 10:50:37,448] Trial 48 finished with value: 83.88 and parameters: {'num_rbm_epochs': 5, 'batch_size': 252, 'rbm_lr': 0.079654431761058, 'rbm_hidden': 7811, 'fnn_hidden': 198, 'fnn_lr': 0.002309869969990883, 'num_classifier_epochs': 5}. Best is trial 45 with value: 86.77.
[I 2025-03-16 10:51:14,701] Trial 49 finished with value: 84.89 and parameters: {'num_rbm_epochs': 5, 'batch_size': 667, 'rbm_lr': 0.064477406468612, 'rbm_hidden': 7341, 'fnn_hidden': 208, 'fnn_lr': 0.001921665219518418, 'num_classifier_epochs': 5}. Best is trial 45 with value: 86.77.

Test Accuracy by RBM Hidden Units

Test Accuracy by FNN Hidden Units

Conclusion

  • Summarize your key findings.
    Model Optuna Best Trial
    MLflow Test Accuracy (%)
    Logistic Regression 84.64%
    Feed Forward Network 88.10%
    Convolutional Neural Network 90.99%
    Logistic Regression (on RBM Hidden Features) 86.95%
    Feed Forward Network (on RBM Hidden Features) 86.98%
  • Discuss the implications of your results.

References

Akiba, Takuya, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A Next-Generation Hyperparameter Optimization Framework.” In The 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2623–31.
Aslan, Narin, Sengul Dogan, and Gonca Ozmen Koca. 2023. “Automated Classification of Brain Diseases Using the Restricted Boltzmann Machine and the Generative Adversarial Network.” Engineering Applications of Artificial Intelligence 126: 106794.
Fiore, Ugo, Francesco Palmieri, Aniello Castiglione, and Alfredo De Santis. 2013. “Network Anomaly Detection with the Restricted Boltzmann Machine.” Neurocomputing 122: 13–23.
Fischer, Asja, and Christian Igel. 2012. “An Introduction to Restricted Boltzmann Machines.” In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 17th Iberoamerican Congress, CIARP 2012, Buenos Aires, Argentina, September 3-6, 2012. Proceedings 17, 14–36. Springer.
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.
Hinton, Geoffrey. 2010. “A Practical Guide to Training Restricted Boltzmann Machines.” Momentum 9 (1): 926.
Hinton, Geoffrey E. 2002. “Training Products of Experts by Minimizing Contrastive Divergence.” Neural Computation 14 (8): 1771–1800.
Melko, Roger G, Giuseppe Carleo, Juan Carrasquilla, and J Ignacio Cirac. 2019. “Restricted Boltzmann Machines in Quantum Physics.” Nature Physics 15 (9): 887–92.
Ning, Lin, Randall Pittman, and Xipeng Shen. 2018. “LCD: A Fast Contrastive Divergence Based Algorithm for Restricted Boltzmann Machine.” Neural Networks 108: 399–410.
Oh, Sangchul, Abdelkader Baggag, and Hyunchul Nha. 2020. “Entropy, Free Energy, and Work of Restricted Boltzmann Machines.” Entropy 22 (5): 538.
Peng, Chao-Ying Joanne, Kuk Lida Lee, and Gary M Ingersoll. 2002. “An Introduction to Logistic Regression Analysis and Reporting.” The Journal of Educational Research 96 (1): 3–14.
Salakhutdinov, Ruslan, Andriy Mnih, and Geoffrey Hinton. 2007. “Restricted Boltzmann Machines for Collaborative Filtering.” In Proceedings of the 24th International Conference on Machine Learning, 791–98.
Smolensky, Paul et al. 1986. “Information Processing in Dynamical Systems: Foundations of Harmony Theory.”
Xiao, Han, Kashif Rasul, and Roland Vollgraf. 2017. “Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms.” August 28, 2017. https://arxiv.org/abs/cs.LG/1708.07747.
Zaharia, Matei, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue Ann Hong, Andy Konwinski, Siddharth Murching, et al. 2018. “Accelerating the Machine Learning Lifecycle with MLflow.” IEEE Data Eng. Bull. 41 (4): 39–45.
Zhang, Nan, Shifei Ding, Jian Zhang, and Yu Xue. 2018. “An Overview on Restricted Boltzmann Machines.” Neurocomputing 275: 1186–99.